[Tutorial] VTA MxNet

Good morning,

Following the VTA MxNet tutorial, I get a segmentation fault.
The segmentation fault is produced during the execution of GraphRuntime::Run. More specifically, the loop that executes op_execs_[i]. The first three (of 100) are not executed and the fourth starts executing and generates a segmentation fault.

Could anyone help me with this? Os throw any light as why this is happening?

Thank you very much for your time.

Error trace:

Segmentation fault: 11

Segmentation fault: 11

Segmentation fault: 11

Stack trace:
[bt] (0) /home/mike/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x2b64150) [0x7eff5ff51150]
[bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7eff7c2924b0]
[bt] (2) /tmp/tmptxfa_dss/graphlib.o.so(+0x174bf) [0x7eff7019e4bf]
[bt] (3) /scratch/mike/tvm/build/libtvm.so(tvm::runtime::ThreadPool::Launch(int ()(int, TVMParallelGroupEnv, void*), void*, int, int)+0xfd1) [0x7eff31d8d621]
[bt] (4) /scratch/mike/tvm/build/libtvm.so(TVMBackendParallelLaunch+0x63) [0x7eff31d8af03]
[bt] (5) /tmp/tmptxfa_dss/graphlib.o.so(+0x170eb) [0x7eff7019e0eb]
[bt] (6) /tmp/tmptxfa_dss/graphlib.o.so(fused_nn_conv2d_add_nn_relu+0x3bd) [0x7eff7019dcad]
[bt] (7) /scratch/mike/tvm/build/libtvm.so(+0xbea331) [0x7eff31d84331]
[bt] (8) /scratch/mike/tvm/build/libtvm.so(+0xc3f6a7) [0x7eff31dd96a7]
Stack trace:
[bt] (0) /home/mike/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x2b64150) [0x7eff5ff51150]
[bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7eff7c2924b0]
[bt] (2) /tmp/tmptxfa_dss/graphlib.o.so(+0x174bf) [0x7eff7019e4bf]
[bt] (3) /scratch/mike/tvm/build/libtvm.so(tvm::runtime::ThreadPool::RunWorker(int)+0x1b9) [0x7eff31d8ba19]
[bt] (4) /scratch/mike/tvm/build/libtvm.so(std::thread::_Impl<std::_Bind_simple<tvm::runtime::threading::ThreadGroup::Impl::Impl(int, std::function<void (int)>, bool)::{lambda()#1} ()> >::M_run()+0x31) [0x7eff31d812e1]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7eff755cfc80]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7eff7c62e6ba]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7eff7c36441d]
terminate called after throwing an instance of ‘dmlc::Error’
what(): [11:27:34] /scratch/mike/tvm/src/runtime/workspace_pool.cc:116: Check failed: allocated
.size() == 1 (3 vs. 1) :
Stack trace:
[bt] (0) /scratch/mike/tvm/build/libtvm.so(tvm::runtime::WorkspacePool::Pool::Release(DLContext, tvm::runtime::DeviceAPI*)+0x652) [0x7eff31d9a1d2]
[bt] (1) /scratch/mike/tvm/build/libtvm.so(tvm::runtime::WorkspacePool::~WorkspacePool()+0x3f) [0x7eff31d9887f]
[bt] (2) /lib/x86_64-linux-gnu/libc.so.6(__call_tls_dtors+0x3f) [0x7eff7c2975ff]
[bt] (3) /lib/x86_64-linux-gnu/libc.so.6(+0x39f27) [0x7eff7c296f27]
[bt] (4) /lib/x86_64-linux-gnu/libc.so.6(+0x3a045) [0x7eff7c297045]
[bt] (5) /home/mike/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x2b64188) [0x7eff5ff51188]
[bt] (6) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7eff7c2924b0]
[bt] (7) /tmp/tmptxfa_dss/graphlib.o.so(+0x174bf) [0x7eff7019e4bf]
[bt] (8) /scratch/mike/tvm/build/libtvm.so(tvm::runtime::ThreadPool::Launch(int ()(int, TVMParallelGroupEnv, void*), void*, int, int)+0xfd1) [0x7eff31d8d621]

1 Like

what are you running this on? the Pynq FPGA?

I recently came across a similar problem when I go through the VTA MxNet tutorial, is any update of this problem:

Process:               Python [18397]
Path:                  /Library/Frameworks/Python.framework/Versions/3.8/Resources/Python.app/Contents/MacOS/Python
Identifier:            Python
Version:               3.8.0 (3.8.0)
Code Type:             X86-64 (Native)
Parent Process:        zsh [11738]
Responsible:           iTerm2 [1025]
User ID:               501

Date/Time:             2020-08-13 22:10:24.361 +0800
OS Version:            Mac OS X 10.15.6 (19G2021)
Report Version:        12
Bridge OS Version:     4.6 (17P6610)
Anonymous UUID:        5157B874-0DFA-2106-FC9E-40E18F1CCB5E


Time Awake Since Boot: 27000 seconds

System Integrity Protection: enabled

Crashed Thread:        5

Exception Type:        EXC_BAD_ACCESS (SIGABRT)
Exception Codes:       EXC_I386_GPFLT
Exception Note:        EXC_CORPSE_NOTIFY

Application Specific Information:
abort() called
Python(18397,0x700004953000) malloc: Incorrect checksum for freed object 0x7f92263fff70: probably modified after being freed.
Corrupt value: 0x6572756c69614620

Error log:

Reconfigured FPGA and RPC runtime in 2.95s!
"-target" is deprecated, use "-mtriple" instead.
"-target" is deprecated, use "-mtriple" instead.
"-target" is deprecated, use "-mtriple" instead.
"-target" is deprecated, use "-mtriple" instead.
"-target" is deprecated, use "-mtriple" instead.
"-target" is deprecated, use "-mtriple" instead.

Segmentation fault: 11


Segmentation fault: 11


Segmentation fault: 11


Segmentation fault: 11

Stack trace:
  [bt] (0) 1   libmxnet.so                         0x000000011a1c6120 mxnet::Storage::Get() + 11632
  [bt] (1) 2   libsystem_platform.dylib            0x00007fff6ebc05fd _sigtramp + 29
  [bt] (2) 3   ???                                 0x0000000000003ffb 0x0 + 16379
  [bt] (3) 4   ???                                 0x00000001371b1494 0x0 + 5219488916
  [bt] (4) 5   libtvm.dylib                        0x0000000132184d4c tvm::relay::Interpreter::InvokePrimitiveOp(tvm::relay::Function const&, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&) + 3516
  [bt] (5) 6   libtvm.dylib                        0x000000013218305b tvm::relay::Interpreter::Invoke(tvm::relay::InterpreterClosure const&, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, tvm::relay::Var const&) + 171
  [bt] (6) 7   libtvm.dylib                        0x000000013217e291 tvm::relay::Interpreter::VisitExpr_(tvm::relay::CallNode const*) + 961
  [bt] (7) 8   libtvm.dylib                        0x0000000132181f08 tvm::relay::ExprFunctor<tvm::runtime::ObjectRef (tvm::RelayExpr const&)>::InitVTable()::'lambda4'(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::runtime::ObjectRef (tvm::RelayExpr const&)>*)::__invoke(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::runtime::ObjectRef (tvm::RelayExpr const&)>*) + 24
  [bt] (8) 9   libtvm.dylib                        0x00000001321808bf tvm::NodeFunctor<tvm::runtime::ObjectRef (tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::runtime::ObjectRef (tvm::RelayExpr const&)>*)>::operator()(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::runtime::ObjectRef (tvm::RelayExpr const&)>*) const + 255
Stack trace:
  [bt] (0) 1   libmxnet.so                         0x000000011a1c6120 mxnet::Storage::Get() + 11632
  [bt] (1) 2   libsystem_platform.dylib            0x00007fff6ebc05fd _sigtramp + 29
  [bt] (2) 3   ???                                 0x0000000000000000 0x0 + 0
  [bt] (3) 4   libtvm.dylib                        0x00000001322d6d0e void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, tvm::runtime::threading::ThreadGroup::Impl::Impl(int, std::__1::function<void (int)>, bool)::'lambda'()> >(void*) + 62
  [bt] (4) 5   libsystem_pthread.dylib             0x00007fff6ebcc109 _pthread_start + 148
  [bt] (5) 6   libsystem_pthread.dylib             0x00007fff6ebc7b8b thread_start + 15
Stack trace:
  [bt] (0) 1   libmxnet.so                         0x000000011a1c6120 mxnet::Storage::Get() + 11632
  [bt] (1) 2   libsystem_platform.dylib            0x00007fff6ebc05fd _sigtramp + 29
  [bt] (2) 3   ???                                 0x0000000000000000 0x0 + 0
  [bt] (3) 4   libtvm.dylib                        0x00000001322d6d0e void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, tvm::runtime::threading::ThreadGroup::Impl::Impl(int, std::__1::function<void (int)>, bool)::'lambda'()> >(void*) + 62
  [bt] (4) 5   libsystem_pthread.dylib             0x00007fff6ebcc109 _pthread_start + 148
  [bt] (5) 6   libsystem_pthread.dylib             0x00007fff6ebc7b8b thread_start + 15
Stack trace:
  [bt] (0) 1   libmxnet.so                         0x000000011a1c6120 mxnet::Storage::Get() + 11632
  [bt] (1) 2   libsystem_platform.dylib            0x00007fff6ebc05fd _sigtramp + 29
  [bt] (2) 3   ???                                 0x0000000000000000 0x0 + 0
  [bt] (3) 4   libtvm.dylib                        0x00000001322d6d0e void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, tvm::runtime::threading::ThreadGroup::Impl::Impl(int, std::__1::function<void (int)>, bool)::'lambda'()> >(void*) + 62
  [bt] (4) 5   libsystem_pthread.dylib             0x00007fff6ebcc109 _pthread_start + 148
  [bt] (5) 6   libsystem_pthread.dylib             0x00007fff6ebc7b8b thread_start + 15
Python(18397,0x700004953000) malloc: Incorrect checksum for freed object 0x7f92263fff70: probably modified after being freed.
Corrupt value: 0x6572756c69614620
Python(18397,0x700004953000) malloc: *** set a breakpoint in malloc_error_break to debug
[1]    18397 abort      python3.8 ./deploy_classification.py

Thanks for reporting the issue @Groupsun, I was able to reproduce this issue, and I’m investigating the bug.

I found that this PR has introduced the faulty behavior: https://github.com/apache/incubator-tvm/pull/6195

I’ll need to investigate more tomorrow/wednesday. In the meantime if you need to get the example working you’ll need to revert to d892881c4cc8c9a29bc03233aeac2b1532a9c6891

Any update about this issue @thierry?

Yes, the bug was fixed, but I forgot to update the thread. See: https://github.com/apache/incubator-tvm/pull/6377

Thank you @thierry !

But when I try it (commit 693c0de, or latest commit) I fall into another error when I try to launch the RPC server (target side):

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/lib/python3.6/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/home/xilinx/tvm/vta/python/vta/__init__.py", line 27, in <module>
    from .bitstream import get_bitstream_path, download_bitstream
  File "/home/xilinx/tvm/vta/python/vta/bitstream.py", line 23, in <module>
    from tvm.contrib.download import download
  File "/home/xilinx/tvm/python/tvm/__init__.py", line 64, in <module>
    from . import hybrid
  File "/home/xilinx/tvm/python/tvm/hybrid/__init__.py", line 19, in <module>
    from .utils import create_module, ashybrid, script
  File "/home/xilinx/tvm/python/tvm/hybrid/utils.py", line 23, in <module>
    from .parser import from_source
  File "/home/xilinx/tvm/python/tvm/hybrid/parser.py", line 24, in <module>
    from typed_ast import ast3 as ast
ModuleNotFoundError: No module named 'typed_ast'

Do you have any idea?

ah yes, you’ll need to install typed_ast with pip install typed_ast

Thank you @thierry ! But I have a lot of difficulties to install typed_ast on Ultra96 board. Especially since Pynq uses Python 2.7.