ARM network tuning

fortyq · November 20, 2019, 4:37pm

Hi, I’m tuning MobileNet v2 for retina face, my target looks like this:

llvm -device=arm_cpu -target=arm-linux-gnueabihf -mattr=+neon,+thumb2

I cannot use RPC server for tuning, so I compiled TVM on the device with ARM CPU.

My question is, when I do the tuning with script similar to x86 tutorial, there is change of target

conv2d becomes topi_x86_conv2d_NCHWc and depthwise_conv2d_nchw becomes topi_x86_depthwise_conv2d_NCHWc_from_nchw

even so I use ARM, it suspicious that there is x86 in the name, I printed TASK_TABLE and there are those two targets and intel_graphics and *_int8.

Is that supposed to be that way or I didn’t turn on compile flags that optimize for ARM CPU. Also any advice is highly appreciate, thanks.

FrozenGene · November 25, 2019, 10:53am

Please follow the tutorial of https://docs.tvm.ai/tutorials/autotvm/tune_relay_arm.html. Don’t follow x86 cpu tutorial for arm cpu tuning. ARM cpu doesn’t have x86’s NCHWc transformation. If you want to use contrib_spatial_pack for depthwise on arm cpu, please patch this pr firstly: https://github.com/apache/incubator-tvm/pull/4384 otherwise you will get error.

fortyq · November 25, 2019, 12:11pm

@FrozenGene thanks, but the problem is the tasks that are extracted are not named topi_nn_depthwise_conv2d_nchw or topi_nn_conv2d, they are named conv2d or depthwise_conv2d_nchw and if I don’t replace the name I get an error. What op_name’s should I replace conv2d and depthwise_conv2d_nchw for ARM? Thank you!

FrozenGene · November 25, 2019, 12:15pm

don’t replace them to anything on arm. Just follow the tutorial.I suggest you run the tutorial firstly and after reproducing the tutorial, then you could consider your model. Just step by step, don’t modify it before you understand it fully. If the tutorial you couldn’t run, then report the issue back.

fortyq · November 25, 2019, 12:17pm

Thank you, that make sense, I’ll do that first.

fortyq · November 25, 2019, 1:55pm

@FrozenGene I get the following error for test example and for my net:

Traceback (most rec ent call last):

  File "local_tuning.py", line 122, in <module>
    tune_kernels(tasks, **tuning_option)

  File "local_tuning.py", line 65, in tune_kernels
    'contrib_spatial_pack')

  File "/home/pi/distr/incubator-tvm/python/tvm/autotvm/task/task.py", line 178, in create
    func = TASK_TABLE[func_name]

KeyError: 'topi_nn_depthwise_conv2d_nchw'

tuning script looks like this:

import os
import tvm
from tvm import autotvm
from tvm.autotvm.tuner import XGBTuner, GATuner, RandomTuner, GridSearchTuner
import joblib
# from tvm import relay
# import nnvm


with open("tasks.pickle", "rb") as f:
    tasks = joblib.load(f)

target = tvm.target.create('llvm -device=arm_cpu -target=arm-linux-gnueabihf -mattr=+neon,+thumb2')

model_name = "mnet.025"
log_file = "%s_tuning_1.log" % model_name

tuning_option = {
    'log_filename': log_file,
    'tuner': 'random',
    'early_stopping': None,
    'n_trial': 2,
    'winograd': True,
    'spatial_pack': True,

    'measure_option': autotvm.measure_option(
        builder=autotvm.LocalBuilder(),
        runner=autotvm.LocalRunner(number=10, repeat=1,
                                   min_repeat_ms=1000),
    ),
}


def tune_kernels(tasks,
                 measure_option,
                 tuner='gridsearch',
                 early_stopping=None,
                 n_trial=1,
                 winograd=False,
                 spatial_pack=False,
                 log_filename='tuning.log'):

    if winograd:
        for i in range(len(tasks)):
            try:  # try winograd template
                tsk = autotvm.task.create(tasks[i].name, tasks[i].args,
                                          tasks[i].target, tasks[i].target_host, 'winograd')
                input_channel = tsk.workload[1][1]
                if input_channel >= 64:
                    tasks[i] = tsk
            except Exception:
                pass

    if spatial_pack:
        for i in range(len(tasks)):
            if tasks[i].name == 'topi_nn_depthwise_conv2d_nchw':
                tsk = autotvm.task.create(tasks[i].name, tasks[i].args,
                                          tasks[i].target, tasks[i].target_host,
                                          'contrib_spatial_pack')
                tasks[i] = tsk

    for i, task in enumerate(tasks):
        prefix = "[Task %2d/%2d] " % (i+1, len(tasks))

        # create tuner
        if tuner == 'xgb' or tuner == 'xgb-rank':
            tuner_obj = XGBTuner(task, loss_type='rank')
        elif tuner == 'ga':
            tuner_obj = GATuner(task, pop_size=50)
        elif tuner == 'random':
            tuner_obj = RandomTuner(task)
        elif tuner == 'gridsearch':
            tuner_obj = GridSearchTuner(task)
        else:
            raise ValueError("Invalid tuner: " + tuner)

        # do tuning
        n_trial = min(n_trial, len(task.config_space))
        tmp_log_file = log_filename + ".tmp"
        if os.path.exists(tmp_log_file):
            os.remove(tmp_log_file)

        tuner_obj.tune(n_trial=n_trial,
                       early_stopping=early_stopping,
                       measure_option=measure_option,
                       callbacks=[
                           autotvm.callback.progress_bar(n_trial, prefix=prefix),
                           autotvm.callback.log_to_file(tmp_log_file)])

        autotvm.record.pick_best(tmp_log_file, log_filename)
        os.remove(tmp_log_file)


tune_kernels(tasks, **tuning_option)

task extraction looks like this:

import  mxnet as mx
import tvm
from tvm import autotvm
import tvm.relay as relay
import tvm.relay.testing
import joblib


def get_network(batch_size=1):
    image_shape = (3, 224, 224)
    input_shape = (batch_size, ) + image_shape
    input_layer = 'data'
    shape_dict = {input_layer: input_shape}
    dtype = 'float32'

    mod, params = relay.testing.mobilenet.get_workload(batch_size=batch_size)

    return mod, params, shape_dict, dtype


def extract_tasks():
    target = tvm.target.create('llvm -device=arm_cpu -target=arm-linux-gnueabihf -mattr=+neon,+thumb2')
    net, params, shape_dict, dtype = get_network()

    print("Extract tasks...")
    tasks = autotvm.task.extract_from_program(net["main"],
                                              target=target,
                                              params=params,
                                              ops=(relay.op.nn.conv2d, )
                                              )

    with open('tasks.pickle', 'wb') as f:
        joblib.dump(tasks, f)
        print("Pickled successfully ...")


extract_tasks()