Native inference performance on ARM device

daeinki · October 31, 2018, 9:25am

Hello,

I’m depolying tvm to Linux platform.

As a intial bringup, I managed to depoly tvm runtime and module compiled by nnvm compiler to ARM device.
And on ARM device, I had a inference test - test app is written in c++ code - for the inference test to Resnet18_v1 and squeezenet1.1

ARM(Exynos5433) device info:

CPU : 1.9GHz Quad-Core (Cortex®-A57) + 1.3GHz Quad-Core (Cortex®-A53)
GPU : Mali™-T760 MP6

The measured performance on the ARM device is as following,
For mxnet based resnet18_v1 inference performance
CPU : about 280ms, GPU : about 36ms
For mxnet based squeezenet1.1 inference performance
CPU : about 133ms, GPU : about 4.7ms

For the exact measurement, I used performance mode as cpu governer.
As for the GPU performance, the result is surprising to me. Even the output result says correct label.

The result would make sense?

Thanks,
Inki Dae

srkreddy1238 · October 31, 2018, 9:43am

Yes, they make sense. GPU is way to faster than CPU.

daeinki · October 31, 2018, 10:39am

I know GPU is faster than CPU generally. However, GPU is too much faster then CPU.

srkreddy1238 · October 31, 2018, 10:56am

Yes, its possible depending on GPU configuration (Threads & Blocks).

FrozenGene · October 31, 2018, 1:33pm

Have you used AutoTVM to train on the ARM CPU? https://docs.tvm.ai/tutorials/autotvm/tune_nnvm_arm.html

merrymercy · October 31, 2018, 1:40pm

Your GPU results are suspectable.

You should compare your numbers to our benchmark here https://github.com/dmlc/tvm/wiki/Benchmark#mobile-gpu. You can compare the frequency, number of cores of GPU.

You can also validate it through rpc using our benchmark script, which is easier and accurate.

daeinki · November 1, 2018, 4:49am

Thanks for reply.
I will try to validate it using the benchmark script.

Thanks,
Inki Dae

daeinki · November 2, 2018, 2:24am

By the way, have you tested inference on real device and c++ code not RPC way and python code?
As I mentioned above, the result says correct label and it shown response time too faster than CPU. Anyway, I will test other models also to make sure.

Thanks,
Inki Dae

daeinki · November 2, 2018, 12:16am

I’d never used AutoTVM.

FrozenGene · November 2, 2018, 1:35am

I think you should train it using autotvm to get better result.

daeinki · November 2, 2018, 2:25am

Thanks for advice.

chay · May 8, 2023, 1:57am

hi，

How can I do AutoTVM while my arm platform have no python environment, which means I can’t not use RPC.