[TOPI] Using x86 schedules for ARM conv2d

kindlehe · April 16, 2020, 6:52am

Thanks a lot! I am very glad that you can share a script to help me start, You can sent a message through the website MESSAGE.

anijain2305 · April 17, 2020, 12:14am

I used NHWC schedule for tuning for mobilenet. Following is the result

Network	TVM NCHWc (ms)	TFLite NHWC (ms)
mobilenet-v1	72.46	210.00

It seems current NHWC schedule requires deeper investigation. We lack a NHWC depthwise schedule. I also saw some warnings/errors while compiling/autotuning. Listing them here - these can tell the next steps. @FrozenGene can you take a look at improving NHWC schedule?

Compilation

AlterOpLayout issue - https://github.com/apache/incubator-tvm/pull/5350 Might be possible to hide kernel layout change.

Auto-tuning

Detect vectorize inside vectorized loop, ignoring…
Large unroll factor - result: MeasureResult(costs=(InstantiationError(['Too large factor for unrolling', 'Too large factor for unrolling'],),), error_no=1
Timeout error - result: MeasureResult(costs=(TimeoutError(),), error_no=6

I also created a quick tutorial here - https://github.com/apache/incubator-tvm/pull/5354

This is a tutorial on tuning a TFLite model for ARM CPUs. This tutorial is largely based on previous two tutorials.

Compile TFLite Models - https://docs.tvm.ai/tutorials/frontend/from_tflite.html#sphx-glr-tutorials-frontend-from-tflite-py
Auto-tuning a convolutional network for ARM CPUs - https://docs.tvm.ai/tutorials/autotvm/tune_relay_arm.html#sphx-glr-tutorials-autotvm-tune-relay-arm-py

Actually, it is mostly a copy paste of 2 tutorials. The interesting change is only this - https://github.com/apache/incubator-tvm/pull/5354/files#r409919577

I am not sure if we need a new tutorial that is 90% same to previous tutorials? @tqchen Do you have any comments?

@kindlehe you can use the script in the tutorial to get started.

FrozenGene · April 17, 2020, 6:19am

sure.

This should be NHWC schedule problem.

It is normal. Because we have max_unroll to restrict it.

It it normal. When we have unroll, sometimes we will meet build time out. If the schedule is not good, we will meet runtime out error. I think it is accetable.

Maybe we could just add one TFLite network on Auto-tuning a convolutional network for ARM CPUs and add one section note in Compile TFLite Models to instruct users know how to get better performance leveraging AutoTVM.

kindlehe · April 17, 2020, 6:45am

Thanks very much! I ll try it.

@anijain2305