[TOPI] Using x86 schedules for ARM conv2d

Thanks a lot! I am very glad that you can share a script to help me start, You can sent a message through the website MESSAGE. :grinning:

I used NHWC schedule for tuning for mobilenet. Following is the result

Network TVM NCHWc (ms) TFLite NHWC (ms)
mobilenet-v1 72.46 210.00

It seems current NHWC schedule requires deeper investigation. We lack a NHWC depthwise schedule. I also saw some warnings/errors while compiling/autotuning. Listing them here - these can tell the next steps. @FrozenGene can you take a look at improving NHWC schedule?

Compilation

  1. AlterOpLayout issue - https://github.com/apache/incubator-tvm/pull/5350 Might be possible to hide kernel layout change.

Auto-tuning

  1. Detect vectorize inside vectorized loop, ignoring…
  2. Large unroll factor - result: MeasureResult(costs=(InstantiationError(['Too large factor for unrolling', 'Too large factor for unrolling'],),), error_no=1
  3. Timeout error - result: MeasureResult(costs=(TimeoutError(),), error_no=6

I also created a quick tutorial here - https://github.com/apache/incubator-tvm/pull/5354

This is a tutorial on tuning a TFLite model for ARM CPUs. This tutorial is largely based on previous two tutorials.

  1. Compile TFLite Models - https://docs.tvm.ai/tutorials/frontend/from_tflite.html#sphx-glr-tutorials-frontend-from-tflite-py
  2. Auto-tuning a convolutional network for ARM CPUs - https://docs.tvm.ai/tutorials/autotvm/tune_relay_arm.html#sphx-glr-tutorials-autotvm-tune-relay-arm-py

Actually, it is mostly a copy paste of 2 tutorials. The interesting change is only this - https://github.com/apache/incubator-tvm/pull/5354/files#r409919577

I am not sure if we need a new tutorial that is 90% same to previous tutorials? @tqchen Do you have any comments?

@kindlehe you can use the script in the tutorial to get started.

sure.

This should be NHWC schedule problem.

It is normal. Because we have max_unroll to restrict it.

It it normal. When we have unroll, sometimes we will meet build time out. If the schedule is not good, we will meet runtime out error. I think it is accetable.

Maybe we could just add one TFLite network on Auto-tuning a convolutional network for ARM CPUs and add one section note in Compile TFLite Models to instruct users know how to get better performance leveraging AutoTVM.

Thanks very much! I ll try it.

@anijain2305