[VTA] Issue with TSIM based simulation of resnet18 inference on VTA

I think there is an issue with current TSIM for resnet-18 computation on VTA. My current experiments are constantly reproducible on both Linux and macOS, while attempts with FSIM backend are successful.

On both Linux and macOS, it just crashed into a segmentation fault error. To reproduce the error, configure vta/config/vta_config.json to use tsim backend, and run deploy_vision_on_vta.py with python3.

Same error even pull the lastest code at 0ct 7, 2019!

It happens when run to:

timer = m.module.time_evaluator("run", ctx, number=num, repeat=rep)
timer()

The error is as:

Stack trace:
  [bt] (0) /home/sun/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2b64150) [0x7f8b90894150]
  [bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20) [0x7f8ba416ef20]
  [bt] (2) /tmp/tmpzoqyk19b/graphlib.o.so(+0x1f27e) [0x7f8b6b6cf27e]
  [bt] (3) /home/sun/File/TVM/Projects/tvm/build/libtvm.so(tvm::runtime::ThreadPool::Launch(int (*)(int, TVMParallelGroupEnv*, void*), void*, int, int)+0xfee) [0x7f8b63b3539e]
  [bt] (4) /home/sun/File/TVM/Projects/tvm/build/libtvm.so(TVMBackendParallelLaunch+0x63) [0x7f8b63b32e93]
  [bt] (5) /tmp/tmpzoqyk19b/graphlib.o.so(+0x1ed2b) [0x7f8b6b6ced2b]
  [bt] (6) /tmp/tmpzoqyk19b/graphlib.o.so(fused_nn_conv2d_add_nn_relu+0x3c3) [0x7f8b6b6ce8e3]
  [bt] (7) /home/sun/File/TVM/Projects/tvm/build/libtvm.so(+0xbd8210) [0x7f8b63b20210]
  [bt] (8) /home/sun/File/TVM/Projects/tvm/build/libtvm.so(+0xc2bfe7) [0x7f8b63b73fe7]
Stack trace:
  [bt] (0) /home/sun/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2b64150) [0x7f8b90894150]
  [bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20) [0x7f8ba416ef20]
  [bt] (2) /tmp/tmpzoqyk19b/graphlib.o.so(+0x1f27e) [0x7f8b6b6cf27e]
  [bt] (3) /home/sun/File/TVM/Projects/tvm/build/libtvm.so(tvm::runtime::ThreadPool::RunWorker(int)+0x157) [0x7f8b63b33947]
  [bt] (4) /home/sun/File/TVM/Projects/tvm/build/libtvm.so(std::thread::_State_impl<std::thread::_Invoker<std::tuple<tvm::runtime::threading::ThreadGroup::Impl::Impl(int, std::function<void (int)>, bool)::{lambda()#1}> > >::_M_run()+0x31) [0x7f8b63b36521]
  [bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbd66f) [0x7f8b8d73566f]
  [bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f8ba3f176db]
  [bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f8ba425188f]
terminate called after throwing an instance of 'dmlc::Error'
  what():  [15:18:42] /home/sun/File/TVM/Projects/tvm/src/runtime/workspace_pool.cc:116: Check failed: allocated_.size() == 1 (3 vs. 1) : 
Stack trace:
  [bt] (0) /home/sun/File/TVM/Projects/tvm/build/libtvm.so(tvm::runtime::WorkspacePool::Pool::Release(DLContext, tvm::runtime::DeviceAPI*)+0x7d7) [0x7f8b63b3b527]
  [bt] (1) /home/sun/File/TVM/Projects/tvm/build/libtvm.so(tvm::runtime::WorkspacePool::~WorkspacePool()+0x37) [0x7f8b63b39937]
  [bt] (2) /lib/x86_64-linux-gnu/libc.so.6(__call_tls_dtors+0x3f) [0x7f8ba41738af]
  [bt] (3) /lib/x86_64-linux-gnu/libc.so.6(+0x43117) [0x7f8ba4173117]
  [bt] (4) /lib/x86_64-linux-gnu/libc.so.6(+0x4313a) [0x7f8ba417313a]
  [bt] (5) /home/sun/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2b64188) [0x7f8b90894188]
  [bt] (6) /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20) [0x7f8ba416ef20]
  [bt] (7) /tmp/tmpzoqyk19b/graphlib.o.so(+0x1f27e) [0x7f8b6b6cf27e]
  [bt] (8) /home/sun/File/TVM/Projects/tvm/build/libtvm.so(tvm::runtime::ThreadPool::Launch(int (*)(int, TVMParallelGroupEnv*, void*), void*, int, int)+0xfee) [0x7f8b63b3539e]


[1]    3914 abort (core dumped)  python3 -m pdb vta/tutorials/frontend/deploy_vision_on_vta.py

Do you have solution or suggestions at it?

We are still working on end-to-end support for VTA/TSIM. At the moment, only unittests are working.

With “TARGET” set to “tsim” in vta_config.json, I tried the demos in vta/tutorials and vta/tests/integration (including the test_vta_insn.py). The error is:

File “/krzhang/tvm/tvm0809/tvm-0809-base/vta/python/vta/testing/simulator.py”, line 43, in _load_all
m = tvm.module.load(lib[0], “vta-tsim”)
IndexError: list index out of range

It seems that “libvta_hw.so” is missing.
Am I using too old version?
Appreciate it!

You have to first build the hardware library by going to tvm/vta/hardware/chisel and then running make that should create libvta_hw.so

Hope it helps!

Has there been any progress on this? Should we try to fix this ourselves or is there active working going on now to fix it?

I’m working on a fix to this. Please stay tuned.

We are still interested in a fix here. If you let us know how far you have gotten, we can try to take it over from there.

Thanks for your attention, I think for now, you can have a successful evaluation of the test_benchmark_topi_conv2d.py script with TSIM backend. The script performs most of the workloads in resnet18. Therefore, I think the hardware implement (Chisel VTA) along with TSIM based simulation should be fine for now.

As for the problem in evaluating deploy_vision_on_vta.py script, the error reported seems to be related to the integration with relay and the runtime.

Note that you might need to duplicate the lines with pynq_1x16_i8w8a32_15_15_18_17 in the file ~/.tvm/tophub/vta_v0.06.log, and replace pynq_1x16_i8w8a32_15_15_18_17 with tsim_1x16_i8w8a32_15_15_18_17 in order to load pre-tuned schedule parameters correctly.

Thanks! I’m able to get that running through.

Hi @thierry, since @stevenmburns can also eval test_benchmark_topi_conv2d.py successfully with TSIM backend, I think there is no hardware issue in Chisel VTA to enable end-to-end inference.

As we are heading towards enabling end-to-end inference with the deploy_vision_on_vta.py script, I observed the segment fault actually take place in the 1st layer of resnet18. I also observed that the 1st layer of resnet18 doesn’t actually run on VTA, since it is ahead of “nn.max_pool2d” layer.

The stack trace looks like the following (some functions actually exist in the generated code):

#0  0x00007fffba0a0cd6 in tvm::runtime::ThreadPool::RunWorker(int) (this=0x2032da8, worker_id=1)
    at /home/liangfu/workspace/tvm_upstream/src/runtime/thread_pool.cc:365
#1  0x00007fffba0a04f9 in tvm::runtime::ThreadPool::ThreadPool()::{lambda(int)#1}::operator()(int) const
    (__closure=0x22f0518, worker_id=1) at /home/liangfu/workspace/tvm_upstream/src/runtime/thread_pool.cc:291
...
#3 in __TVMBackendParallelLaunch
#4 in fused_nn_conv2d_add_nn_relu_compute_
#5 in fused_nn_conv2d_add_nn_relu
...

Do you have any suggestions in making this actually work?

We (Intel Strategic CAD Labs) would like to get the end to end flow working as well. A few observations and questions from our end:

  1. The end to end flow works with target “sim” but not with “tsim”. The verilator simulation resets and performs zero or one clock ticks out of reset before the crash occurs. I get four separate core dumps before the threading code produces a stack trace. When I run in gdb I see the first core dump is in the code generated by the runtime (libgraph I think). There are no debugging symbols to see exactly what happened. (Perhaps there is a way to get more debug visability here.) Why would sim work and tsim not work before the tsim simulator starts doing anything real? Any thought?
  2. Does the end to end flow work for you on the de10 nano, or do you get a similar issue with a runtime coredump? We have a de10 nano up and running, but I don’t know the results yet of running the end to end flow? Is it expected to work? If it does work, what is different about this environment than the tsim verilator set up that could cause the difference?

Thanks for your help.

@stevenmburns Thanks for the comments. Unfortunately I don’t have a fix as well.

  1. My previous post was a diagnose on the fix to bring TSIM support for the end to end work flow.
  2. It doesn’t work on my de10-nano as well. The error take place on the 1st layer of resnet18 inference, which is designed to run on cpu instead, if I understand correctly.

@vegaluis @thierry do you have any suggestions?

Specifically, for an inference of the end to end work flow, I have compared the intermediate results, and observed that the Relay IR and Halide IR are exactly the same between FSim and TSim backends, however, the LLVM IR are not exactly the same.