Unable to reproduce benchmark results on ARM CPU

Spiraline · February 25, 2020, 5:01am

I am trying to run MobileNet on ARM CPU and I found there is a benchmark result for MobileNet in tvm wiki. According to the result, MobileNetV2 shows 29.3 ms on Google Pixel2 which has snapdragon 835. Since I use LG G8 which has snapdragon 855, I expect a better result for MobileNetV2. But it takes 70.16 ms when I run the MobileNetV2 TensorFlow graph on ARM CPU (I used XGBTuner and n_trial is 1000). I also tried to run without tuning, but it took 77.22 ms for the TFLite model and 81.22 ms for the TensorFlow graph.

I used below target

llvm -device=arm_cpu -model=snapdragon835 -target=aarch64-linux-android -mcpu=kryo -mattr=+neon

according to this post

What can I try more for reproducing benchmark results?

I also tried to tune MobileNetV3 on ARM CPU. But strangely, inference takes longer after tuning. Here is my experiment result.

Inference time without Tuning
- TFLite Model: 28.76 ± 0.99 ms
- Tensorflow Graph : 31.08 ± 0.99 ms
Inference time with Tuning
- TFLite Model : 29.41 ± 2.44 ms
- Tensorflow Graph : 34.75 ± 2.22 ms

Both the TFLite model and Tensorflow Graph take longer after tuning. I tried to use different tuners (like GATuner) but it still takes longer. Is there any reason for this result?