How to improve the auto-tune performance?

Besides apply_history, another reason is maybe the schedule has been changed. Even you leverage the old tuned log, you could have possibility to get better performance.

I have another log and lib with 680 fps , generated by my workmate 5 months ago.

With the 1-month-ago version code, I apply history with the 680fps log, and still got 530 fps deploy_*.so .params etc. That is not normal , right?

Then I checkout to tag v0.6.0 and recompiled the model with 680 fps log, and got 771.36 fps. That should be the result of the schedule change.

The conflict is as below:

The slower log generated 680 fps with tvm about 5 months ago .

Using a 1-month-ago version tvm ,the faster log can only got 530 FPS and the slower log also gets 530FPS .

Using the V0.6.0 tag , they get 771 and 840 FPS for each.

I think it should be the schedule change. I can not make sure whether it is this PR: https://github.com/apache/incubator-tvm/pull/4511 results in. Asymmetric padding will result in our padding workload is 4D now (Previous is 2D), so previous log maybe can not work correctly. Have you tried the latest TVM (tune from scratch and run)? It should be ok.

I used V0.6.0 and the most recent commit is

commit c6f8c23c349f3ef8bacceaf3203f7cc08e6529de Author: Thierry Moreau moreau@uw.edu Date: Tue Nov 26 19:21:56 2019 -0800

[VTA][HotFix] Relay->VTA quantization fix (#4433)

* relay -> vta fix

* setting optlevel to 3 for quantization to fold batchnorm

Seems this pr is 7days ago and has no relation with our discussion.

I cloned the latest codes on github, and it runs crashed with this log. I’m using tf 1.12.0. Any advice ?

Traceback (most recent call last):

File “from_tf.py”, line 17, in import tvm.relay.testing.tf as tf_testing

File “/home/cephfs/data/tvm/python/tvm/relay/testing/tf.py”, line 34, in tf_compat_v1 = tf.compat.v1

AttributeError: module ‘tensorflow._api.v1.compat’ has no attribute ‘v1’

This error should be not related our issue. You could workaround it just to replace v6.0 tf.py and do the next.

1、You mean to replace the tf.py with the v0.6.0 version and use the latest code? I tried this, seems it’s ok. 2、If I use the latest code, the older log is no longer available?

I used the latest code to apply the older log, it’s 530 fps again. This log should be 840fps with V0.6.0.

And also, I run the latest version to generate a model with the pre-defined configuration in tophub, it crashed.

Traceback (most recent call last):

File “from_tf.py”, line 445, in tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py”, line 125, in run _sys.exit(main(argv))

File “from_tf.py”, line 395, in main eval_tvm(image)

File “from_tf.py”, line 189, in eval_tvm module.run()

File “/home/hadoop-hdp/cephfs/data/yuweilong/tvm/python/tvm/contrib/graph_runtime.py”, line 169, in run self._run()

File “/home/hadoop-hdp/cephfs/data/yuweilong/tvm/python/tvm/_ffi/_ctypes/function.py”, line 207, in call raise get_last_ffi_error()

tvm._ffi.base.TVMError: Traceback (most recent call last): [bt] (8) /home/hadoop-hdp/cephfs/data/yuweilong/tvm/build/libtvm.so(TVMFuncCall+0x61) [0x7fef40a1ca61] [bt] (7) /home/hadoop-hdp/cephfs/data/yuweilong/tvm/build/libtvm.so(tvm::runtime::GraphRuntime::Run()+0x47) [0x7fef40a66167] [bt] (6) /home/hadoop-hdp/cephfs/data/yuweilong/tvm/build/libtvm.so(+0x12040d7) [0x7fef40a660d7] [bt] (5) /home/hadoop-hdp/cephfs/data/yuweilong/tvm/build/libtvm.so(+0x11a6a47) [0x7fef40a08a47] [bt] (4) /home/hadoop-hdp/cephfs/data/yuweilong/dl-benchmark/models/MobilenetV1/deploy_lib.tar.so(fused_nn_conv2d_19+0x2fe) [0x7fef30261d0e] [bt] (3) /home/hadoop-hdp/cephfs/data/yuweilong/dl-benchmark/models/MobilenetV1/deploy_lib.tar.so(+0x7168) [0x7fef30262168] [bt] (2) /home/hadoop-hdp/cephfs/data/yuweilong/tvm/build/libtvm.so(TVMBackendGetFuncFromEnv+0x60) [0x7fef40a1c970] [bt] (1) /home/hadoop-hdp/cephfs/data/yuweilong/tvm/build/libtvm.so(tvm::runtime::ModuleNode::GetFuncFromEnv(std::__cxx11::basic_string<char, std::char_traits, std::allo$ ator > const&)+0x3da) [0x7fef40a1701a] [bt] (0) /home/hadoop-hdp/cephfs/data/yuweilong/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32) [0x7fef401f8662] File “/home/hadoop-hdp/cephfs/data/yuweilong/tvm/src/runtime/module.cc”, line 123 File “/home/hadoop-hdp/cephfs/data/yuweilong/tvm/src/runtime/library_module.cc”, line 91 TVMError: Check failed: ret == 0 (-1 vs. 0) : Check failed: f != nullptr: Cannot find function tvm.contrib.cudnn.conv2d.forward in the imported modules or global registry

2、If I use the latest code, the older log is no longer available?

Yes, because of asymmetric padding pr I mentioned. The padding becomes 4D. You should be better to re-tune.

And also, I run the latest version to generate a model with the pre-defined configuration in tophub, it crashed.

The error should be you don’t enable cudnn to build TVM.

Well, I don’t know why cudnn is called.

I’m using the following codes:

When I compile mobilenet without auto tuning , I can get the pre-defined config. This code is ok with older version of tvm ,and I can get 720 fps mobilenet.

Anyway , I’d better use V0.6.0 for now. It’s too hard for me to debug these problems, I can’t quite understand tvm well now.

If you don’t compile the target with cudnn, I think it is one bug of TVM. If you would like to spare some time, you could prepare one reproducible script and open this issue.

It is fine to use V0.6.0.

OK, I’ll confirm whether it’s a bug.

It’s ok , no crash. I made a mistake when modify my code.

The only problem with the latest code is just the speed issue.

I use the simple build api to compile mobilenet_v1_1.0_224_frozen.pb with opt = 3, and got 511 fps. The default config should be 700+ fps.

Maybe configuration parsing format is changed and you forgot to update tophub? Here is the code.

I’ll use V0.6.0 for a time.

TopHub should be updated, however, this PR is not my own. I also don’t have related machines. Maybe someone could help to retune and update the tophub log.

We also encountered the workload mismatch issue due to asymmetric padding, but we have no idea about who should we contact to update Tophub. @tqchen @merrymercy could you help to find a POC?

There is a conclusion to this post. The code version I used have something wrong, it’s about early in Dec 2019. The auto tune is fine, and it can get the best configurations during the n_trials. But when it comes to apply_history, the configurations are just discarded. The bad performance is totally due to this.

I don’t know if it’s fixed in the most recent version。I’ll use tag V0.6.0 for a time.

I tested the tag V0.6.0 to auto tune mobilenet, and found another problem.

The whole process contains auto-tune, apply_history & build.

The first time I auto-tune and build to generate model, the generated model performs bad, about 40fps.This process generates a mobilenet.log.

And then I skipped the auto-tune process and use the log to apply_history , the generated model is 690fps.

There is a big difference between calling apply_history alone and calling apply_history after auto tune.

Here is the log. In the first experiment , It stuck after auto-tune & eval-tvm. I killed it.

The second experiment, I compiled with mobilenet.log directly.

Notice the inference cost is 21ms and 1ms for each, where do you think the difference may come from ? I’m using the same log file.

Not sure where this large gap comes from. The proces you did seem fine to me. Failed to warm up GPU may cause some performance difference but I’ve never seen ~20x differences…