Armv7-a performance slower than other deepinglearning framework

sanallen · June 6, 2018, 1:48am

I have tested tvm mobilnetnet_v1_1.0_224 on an ARM Cortex A53（QuadCore 1.8Ghz）， the result is：
4 Threads
Caffe2 302ms
ncnn 280ms
tensorflow-lite 200ms
tvm 141ms

the performance of tvm is better than the other deepinglearning frameworks. But i tested on an ARM Cortex A7（QuadCore 1.8Ghz），the result is：
4 Threads
Caffe2 562ms
ncnn 443ms
tensorflow-lite 563ms
tvm 687ms
the performance of tvm is worse than the other deepinglearning frameworks on ARM Cortex A7。
@tqchen @merrymercy hi，Do you have any ideas and Suggestions?

merrymercy · June 6, 2018, 11:31am

Because there are some constant tiling factors in the schedule

github.com

dmlc/tvm/blob/04c756237b89bafdab63936756a7668ecdc90883/topi/python/topi/rasp/conv2d.py#L16


from .. import tag
from ..nn.conv2d import conv2d as _conv2d, _get_schedule
from ..nn.conv2d import SpatialPack, Im2ColPack
from ..nn.conv2d import _WORKLOADS, _SCH_TO_DECL_FUNC
from ..nn.conv2d import _get_workload
from ..nn.util import infer_pad, infer_stride
from .. import generic


_SCHEDULES = [
# float32 imagenet
SpatialPack(1, 8, 4, 1, 4, True),
SpatialPack(1, 7, 4, 2, 4, True),
SpatialPack(1, 4, 8, 4, 1, True),
SpatialPack(1, 4, 4, 1, 16, False),
SpatialPack(1, 4, 8, 4, 8, False),
SpatialPack(1, 7, 4, 3, 8, True),
SpatialPack(1, 2, 8, 1, 8, True),
SpatialPack(2, 1, 16, 1, 4, True),
SpatialPack(1, 7, 4, 1, 1, True),
Im2ColPack(7, 4, 1, 16, True),
Im2ColPack(7, 4, 1, 8, False),

They are tuned on rasp, so they are suitable for A53 but might not be good for A7. In order to get good performance on A7, you need to tune these factors on your device. You can do some simple grid search for these values.

Or you can wait for some time and try our auto-tuner(Bringing auto-tuner to tvm). We plan to release it in several weeks.

FrozenGene · July 26, 2018, 4:14pm

@merrymercy
Could you give us advices how to tune these factor? for example, how do you get these factor on A53? If we want to tune for A7 or A9, how do we start? Thanks in advance.

merrymercy · July 26, 2018, 6:50pm

You are lucky. I just sent this PR https://github.com/dmlc/tvm/pull/1487.
We can auto-tune any graph for any arm cpu.
There is a tutorial about how to tune parameters by yourself.

FrozenGene · July 26, 2018, 5:36pm

@merrymercy I looked it just now. I find it only tunes for some devices. If my device doesn’t support RPC (for example, my remote device doesn’t hav Python environment and can not set up RPC environment), what I can do is export the nnvm graph model / lib.so / param weights and run it in the device. How can I leverage your work and tune it? Thanks in advance.

tqchen · July 26, 2018, 6:07pm

@FrozenGene Actually I would strongly encourage you to support RPC in your device. Note that RPC is not tied to python, we have RPC server implemented in java(runs on android), objectiveC(runs on iOS). And most of the logics are in C++.

See https://github.com/dmlc/tvm/tree/master/apps/android_rpc

FrozenGene · July 26, 2018, 6:15pm

@tqchen Thanks for your information of RPC is not tied to python. My ARM device is not Android, just one arm cpu on Linux without Python / Java and so on. I know it is strange but it is the fact, I can not also install them. So, I wonder to know whether we have pure C++ RPC which I can cross compiling it on my x86 Linux server and run it on my arm cpu device? Then I think I can reproduce and use it to tune? Thanks in advance.

tqchen · July 26, 2018, 6:50pm

It is possible to build a c++ RPC server as well, note that most of the RPC logic are in C++, and we only use python/java for the basic handshake logics (reporting to RPC tracker) and accepting connection.

I think it should be easy to create a C++ version of RPC logic that encapsulate these wrapping as well. Pleas e let me know if you are interested in contributing to this. To get started, you can take a look at android_rpc and iOS rpc, you will find that there is only a very thin layer of logics that need to be moved to c++

FrozenGene · July 26, 2018, 6:39pm

I will look at it firstly. if I complete it, I will contribute. Thanks,

tqchen · July 26, 2018, 7:08pm