[Android RPC] Stability Issues


#1

I’ve been working on tuning models on a Galaxy S10 using the RPC but am having some difficulties keeping the app alive. Although everything goes smoothly initially and I can tune 1 or 2 layers, eventually the app crashes, sometimes even causing the phone to restart. I have quite a few workloads that need to be tuned so constantly babysitting the app isn’t a great option. Has anyone seen this behavior before or have any ideas for workarounds?


#2

I’m still unable to consistently run the rpc server on galaxy phones (both for cpu and gpu tuning), although it works fine on Pixels. Unfortunately, Pixel phones dont have OpenCL which is the target I’m interested in testing. @yzhliu, what opencl enabled phone did you use when testing the android rpc server?


#3

cc @eqy as far as i know we do have some fault tolerance schemes


#4

In my experience, Android RPC stability is a difficult beast to tame. Our current RPC tuning app uses a watchdog and separate process + activity (as a “workhorse”) for each kernel configuration to be tuned. However, this doesn’t seem to be 100% perfect in terms of crash isolation, especially when other components like the OpenCL driver are involved.

One solution/workaround I was considering to implement is yet another level of watchdog. If you have a machine with a spare USB port that has adb/other tools installed, it’s probably feasible to write a small watchdog that periodically checks if the RPC app has crashed on the phone and restarts it.


#5

Maybe we need C++ API RPC.