[relay][x86][graph_tuner] graph tuner error

tianxingyzxq · June 25, 2019, 9:17am

when I auto tune a modified mobilefacenet, I encountered the following error

tianxingyzxq · June 25, 2019, 9:18am

@kevinthesun can you have a look?

tianxingyzxq · June 25, 2019, 9:58am

if I use mxnet to inference, the time is 41ms. when I use apply_history_best(“model.log”) the time is 360ms.

tianxingyzxq · June 25, 2019, 10:31am

if I just use relay.build without autotvm, the time is 390ms. it is so weird.

kevinthesun · June 25, 2019, 11:45pm

Did you set appropriate llvm target?

tianxingyzxq · June 26, 2019, 12:57am

yes, target = tvm.target.create(“llvm -mcpu=core-avx2”)

tianxingyzxq · June 26, 2019, 1:04am

if i use gpu, target = “cuda”, the time is 5ms.

tianxingyzxq · June 26, 2019, 1:17am

when i build with opt_level=1 or opt_level=2, the time is 130ms. opt_level=3 is 390ms

tianxingyzxq · June 26, 2019, 1:31am

link: https://pan.baidu.com/s/1HJnYdgt7aYoC1jwub3GaMw
code: bjys
this is the model, you can check it if you have time. or can you tell me how to find out the problem.

kevinthesun · June 26, 2019, 9:36pm

You can use debug_runtime to see whether conv2d execution takes majority of the time.

tianxingyzxq · July 4, 2019, 8:25am

when I set opt_level=3, the debug_runtime output is as follows:

tianxingyzxq · July 4, 2019, 8:25am

when I set opt_level=1, the output is:

tianxingyzxq · July 5, 2019, 1:12am

the ori mobilefacenet output:

tianxingyzxq · July 5, 2019, 1:17am

I find the modified mobilefacenet cost much more time for the same op.

tianxingyzxq · July 22, 2019, 9:36am

I have the same time problem when deploy r100 model, time is 700ms when I set opt_level=3, time is 220ms when opt_level=1. Can you have a look at this problem? @kevinthesun @tqchen

tianxingyzxq · July 22, 2019, 9:48am

The output has a lot information like this when I set opt_level=3:

kevinthesun · July 22, 2019, 10:26pm

When setting opt_level=1, there is no depthwise-conv2d in the debug output. Can you look into this?

tianxingyzxq · July 23, 2019, 7:02am

there is no depthwise-conv2d in the debug output using ori mobilefacenet, but In modified mobilefacenet there is depthwise-conv2d. The ori mobilefacenet is ok for opt_level=3 and work well using graph tuner. The modified mobilefacenet is not ok.

tianxingyzxq · July 23, 2019, 7:20am

It seems the depthwise_conv2d layer cost a lot of time.

gasgallo · July 23, 2019, 10:42am

I have exactly the same problem. If opt_level=3 inference time is more than 2x slower than with opt_level=1.