Unable to reproduce benchmark results on ARM CPU

I am trying to run MobileNet on ARM CPU and I found there is a benchmark result for MobileNet in tvm wiki. According to the result, MobileNetV2 shows 29.3 ms on Google Pixel2 which has snapdragon 835. Since I use LG G8 which has snapdragon 855, I expect a better result for MobileNetV2. But it takes 70.16 ms when I run the MobileNetV2 TensorFlow graph on ARM CPU (I used XGBTuner and n_trial is 1000). I also tried to run without tuning, but it took 77.22 ms for the TFLite model and 81.22 ms for the TensorFlow graph.

I used below target

llvm -device=arm_cpu -model=snapdragon835 -target=aarch64-linux-android -mcpu=kryo -mattr=+neon

according to this post

What can I try more for reproducing benchmark results?

I also tried to tune MobileNetV3 on ARM CPU. But strangely, inference takes longer after tuning. Here is my experiment result.

  • Inference time without Tuning

    • TFLite Model: 28.76 ± 0.99 ms
    • Tensorflow Graph : 31.08 ± 0.99 ms
  • Inference time with Tuning

    • TFLite Model : 29.41 ± 2.44 ms
    • Tensorflow Graph : 34.75 ± 2.22 ms

Both the TFLite model and Tensorflow Graph take longer after tuning. I tried to use different tuners (like GATuner) but it still takes longer. Is there any reason for this result?