Auto-tuned model speed up issue

Edwardmark · November 19, 2019, 11:16am

After 15days of auto-tuning, I got a auto-tuned model, but the speed is not satisfactory, any advice? Can I modified the tuning-parameters but using the history best log as a checkpoint and resume from it?

comaniac · November 19, 2019, 5:51pm

You can try to use xgb search approach and enable transfer learning to see if that helps. See this tutorial for the example. Specifically, you could use the following logic to enable transfer learning before launching the tuner.

        if use_transfer_learning:
            if os.path.isfile(tmp_log_file):
                tuner_obj.load_history(autotvm.record.load_from_file(tmp_log_file))

Edwardmark · November 20, 2019, 1:36am

The model is tuned using xgb tuner, I mean what tuning options should I set to search for larger space and get better result? Thanks, man.

comaniac · November 20, 2019, 1:44am

Unfortunately there’s no user APIs to adjust the tuning space if your ops are using TOPI schedule templates. On the other hand, I personally don’t think larger space could achieve better performance, because the performance might be limited by the schedule template instead of configs – especially you have spent 15 days. I would suggest trying to first find out the performance bottleneck (e.g., which layer/op), and see if we can use an alternative op to speedup. For example, if the performance bottleneck is conv2d in NCHW layout, we can try to speedup with HWCN or Winograd algorithm.

Edwardmark · November 20, 2019, 1:59am

Thanks, man. I just follow the tutorials you give to auto-tune. And all my ops contains only conv2d/deconv2d/pool/batchnorm/relu. Two questions:

How can I know if I was using TOPI schedule template instead of configs?
How to test the performance bottleneck, any advice? And the tuned model was already using Winograd, how to speedup with HWCN?

Thanks. man.

comaniac · November 20, 2019, 2:14am

Basically if you followed the tutorial to extract tasks from program then you are using TOPI schedules.
You can refer to Profiling a TVM run on CPU and use graph runtime debugger that will show you the time breakdown of each op.

If you already use Winograd, then the benefit from HWCN might be moderated, because HWCN cannot be applied to the Winograd. Anyway, you could identify the bottleneck first and report its ops/shape/attributes, so that other folks can also provide there thoughts.

Edwardmark · November 20, 2019, 2:50am

Thanks, man. I will give it a try.

Edwardmark · November 20, 2019, 6:01am

Hi, man, I try your profiling code, and get the following output as follows, any advice for improve the speed? The log is here tvm_profiling.log. Could you please give me some advice? Thank you very much.

comaniac · November 20, 2019, 6:01pm

By sorting the result based on the time (%), the most time-consuming ops (>2%) are the following

Ops Time(us) Time(%) Shape

fused_nn_conv2d_add_nn_relu_7 5114.48 13.07 (1, 194, 40, 40) fused_nn_conv2d_transpose_add_add_nn_relu 3399.83 8.688 (1, 76, 160, 160) fused_nn_conv2d_transpose_add_add_nn_relu_2 2816.88 7.198 (1, 304, 40, 40) fused_nn_conv2d_transpose_add_add_nn_relu_1 2786.62 7.121 (1, 152, 80, 80) fused_nn_conv2d_add_nn_relu_3 2099.46 5.365 (1, 509, 20, 20)

So you can further tune the those 5 tasks.

Edwardmark · November 21, 2019, 5:34am

Thank you very much, I will give it a try.

Edwardmark · November 22, 2019, 2:29am

Hi, @comaniac. How can I extract the selected tasks with the name? I run the following code to extract all tasks:

tasks = autotvm.task.extract_from_program(mod["main"], target=target,params=params, ops=(relay.op.nn.conv2d,))

But when I print the name, it is always ‘topi_nn_conv2d’. So I have two questions:

how to extract a certain task with name, say, ‘fused_nn_conv2d_add_nn_relu_7’.
when extrack task using the function autotvm.task.extract_from_program, what does the ops arguments mean? If my network contains deconvolution, should I add it to ops arguments?

Thanks, man.

comaniac · November 22, 2019, 4:58am

There won’t be a op named “fused…”. Fusion happens when building the model while auto-tuning only takes one op. Thus what you have seen is correct. Their task names are all “topi_nn_conv2d”.
Arguments mean the attributes of that op. For example, the arguments of conv2d are

Input data
Input weight
Strides
Padding
Dilation
Data layout
Data type

Different ops require different arguments when creating tasks. As your question, deconvolution is another op, so it will have different task name, so do the arguments. In fact, you can print out those information after extracting tasks, such as print(task.args).

Edwardmark · November 22, 2019, 5:45am

So how can I only auto-tune the “fused…” tasks? How to choose the most time-consuming ops as the profiling log shows?

comaniac · November 22, 2019, 5:56am

What you need to focus on is the shape and attributes of the op. For example, fused_nn_conv2d_add_nn_relu_7 5114.48 13.07 (1, 194, 40, 40) means the input shape is (1, 194, 40, 40), so you go back to the topi_nn_conv2d tasks and see which one has this shape.

Edwardmark · November 22, 2019, 6:22am

ok, so I need to match the task by shape, thanks. I will give it a try.

Edwardmark · November 22, 2019, 6:38am

But what if two ops have same shape? How can I tell the difference of them? I think the ops should have a name attribute.

gasgallo · November 22, 2019, 6:41am

If ops have same shape and parameters, then tuned schedule is the same, that means there will be only one task with that specific shape and parameters.

Edwardmark · November 22, 2019, 7:00am

Hi, man please check the log I post above, it is true many ops have same shape, I cannot tell wich tasks it belongs just by shape. Please help me. Thank you very much.

Edwardmark · November 22, 2019, 9:11am

Hi, Comaniac, I have two questions about this issue.

I found some tasks with the same shape, how can I tell between them?
And I found that there are 106 tasks, when extract tasks, but in the log file, there are 158ops in the graph_runtime_debug.cc:93:log. But there are 177 ops in the time statics table. What do these three different numbers stand for?
Thank you very much.
Best,
Bin

gasgallo · November 22, 2019, 9:13am

Compare the workload, every task workload is unique