Python debugger segfaults with tvm

Hi. Does anyone use python debugger with tvm? I got a segfault every time I’m trying to enter the trace mode. For example, I see the segfault in the program below. If one comments import tvm out and restarts ipython, segfault goes away.

import pdb
import tvm

def test():
  x=1+3
  pdb.set_trace()
  return x
...

Thread 1 "python3.6" received signal SIGSEGV, Segmentation fault.
0x0000000000579b80 in PyModule_GetState ()
(gdb) bt
#0  0x0000000000579b80 in PyModule_GetState ()
#1  0x00007f43ba9fc67e in ?? () from /usr/lib/python3.6/lib-dynload/readline.cpython-36m-x86_64-linux-gnu.so
#2  0x00007f43c21ec3e1 in rl_initialize () from /usr/lib/x86_64-linux-gnu/libedit.so.2
#3  0x00007f43ba9fc485 in PyInit_readline () from /usr/lib/python3.6/lib-dynload/readline.cpython-36m-x86_64-linux-gnu.so
#4  0x0000000000604ea7 in _PyImport_LoadDynamicModuleWithSpec ()
#5  0x0000000000605f66 in ?? ()
#6  0x00000000005871a0 in PyCFunction_Call ()
#7  0x000000000054840c in _PyEval_EvalFrameDefault ()
#8  0x00000000005414d0 in ?? ()
#9  0x0000000000539912 in ?? ()

Regards

Two workarounds which work for me:

  1. Starting Python with -m pdb and throwing an exception instead of set_trace() call. Or -m pdb -c "bp ..." to set a breakpoint from the command line.

  2. Not the debugger, but from IPython import embed; embed() works. Surprising since it should be using readline too.

1 Like

Unfortunately, __import__('IPython').core.debugger.Pdb(color_scheme='Linux').set_trace() fails when loading readline as well, even though the documentation says

Modified from the standard pdb.Pdb class to avoid including readline, so that the command line completion of other programs which include this isn’t damaged.

I’ve tried other core.debugger and terminal.debugger classes, all of them have the same problem.

To get full debugging, pudb (https://documen.tician.de/pudb/index.html) works for me.

I’d like to bump this. This is a real pain as someone who develops on Ubuntu. I’ll try to dig into it if I ever have time, but if anyone has any ideas, please let me know!

I did a bit of debugging today. I posted a full SO question (with debugging results) here: https://stackoverflow.com/questions/57015349/pdb-set-trace-segfaulting-after-importing-tvm-on-ubuntu

A simple (and ugly) workaround is to add import readline before tvm.

import readline
import tvm

Root issue

Occasionally found that libtvm.so might be linked to libedit.so once it is linked against libllvm.so, which might be the root cause of the issue.

I have no idea why libLLVM.so is linked against libedit. @zhiics @were @haichen any ideas?

Build without LLVM

# ipdb does not crash
$ python -c "import tvm; import ipdb; ipdb.set_trace()"
--Return--
None
> <string>(1)<module>()

ipdb>

# not linked against libedit.so
$ ldd libtvm.so
        linux-vdso.so.1 (0x00007ffd1f7ae000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fdeeb4bf000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fdeeb136000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fdeead98000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fdeeab80000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fdeea961000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdeea570000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fdeec6b6000)

Build with LLVM

# ipdb crashes immediately
$ python -c "import tvm; import ipdb; ipdb.set_trace()"
[1]    21458 segmentation fault (core dumped)  python -c "import tvm; import ipdb; ipdb.set_trace()"

# linked against libedit.so
$ ldd libtvm.so
        linux-vdso.so.1 (0x00007ffd9b5f1000)
        libLLVM-8.so.1 => /usr/lib/llvm-8/lib/libLLVM-8.so.1 (0x00007fe136a4d000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fe136849000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fe1364c0000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fe136122000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fe135f0a000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fe135ceb000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe1358fa000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fe13b731000)
        libffi.so.6 => /usr/lib/x86_64-linux-gnu/libffi.so.6 (0x00007fe1356f2000)
        libedit.so.2 => /usr/lib/x86_64-linux-gnu/libedit.so.2 (0x00007fe1354bb000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fe13529e000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fe135096000)
        libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo.so.5 (0x00007fe134e6c000)

Let’s take a look at libLLVM-8.so

$ ldd /usr/lib/llvm-8/lib/libLLVM-8.so
        linux-vdso.so.1 (0x00007ffe15464000)
        libffi.so.6 => /usr/lib/x86_64-linux-gnu/libffi.so.6 (0x00007f75d039f000)
        libedit.so.2 => /usr/lib/x86_64-linux-gnu/libedit.so.2 (0x00007f75d0168000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f75cff4b000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f75cfd43000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f75cfb3f000)
        libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo.so.5 (0x00007f75cf915000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f75cf6f6000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f75cf358000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f75cefcf000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f75cedb7000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f75ce9c6000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f75d41f4000)

Yet another work around

After PR 3954 is merged, we are able to run cmake with extra arguments. Then, in config.cmake, you may force linking to static LLVM library, e.g.

set(USE_LLVM "/usr/bin/llvm-config-8 --ignore-libllvm")

In that case, we can invoke pdb normally without having to worry about core dumps…

$ python -c "import tvm; import ipdb; ipdb.set_trace()"
--Return--
None
> <string>(1)<module>()

ipdb>

Exiting Debugger.
3 Likes

Glad this is getting some activity. I work on a Mac now, and don’t encounter this issue anymore, but I still would love to see it resolved for all of our Linux devs out there.

Hey @gussmith23, does anyone encounter any further issue with my solution above?

@junrushao I check my llvm.so, libedit.so.2 not in libLLVM-5.0.so and not in libtvm.so.

How should I do?

$ ldd libLLVM-5.0.so
        linux-vdso.so.1 (0x00007ffe15464000)
        libffi.so.6 => /lib64/libffi.so.6 (0x00007f75d039f000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f75cfd43000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f75cfb3f000)
        libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007f75cf915000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f75cf6f6000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f75cf358000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f75cefcf000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f75cedb7000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f75ce9c6000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f75d41f4000)

@heliqi could you describe your problem?

@junrushao run python3 xxx.py on centos, show "segmentation fault ".

error code : “autotvm.task.extrac_from_program(xxx)”

My question is how to debugger with tvm?gdb is not easy to use.

This doesn’t seem to be relevant with my solution, which aims at solving symbol conflict crash at the time of libtvm.so loading.

If you would like to debug in Python, ipdb is highly recommended.

Thanks, I misread it!

I sent a patch: https://github.com/apache/incubator-tvm/pull/5685

Maybe In our installable guide, we should mention some LLVM-caused crashes

FWIW, this seems to be what’s needed to get this ignored test working in the Rust bindings:

(in rust/tvm/src/python.rs)

    #[ignore]
    #[test]
    fn test_run() -> Result<()> {
        load().unwrap();
        Ok(())
    }
}

This is significant. Without this fix, a large part of the Rust bindings won’t work (specifically, actually compiling and running code from Rust won’t work!)

I’m not sure if this is known already, or where it would be good to document this fact. cc @jroesch

Sorry to bump an old thread, but this seems important enough to mention. THANK YOU SO MUCH @junrushao !!! Great sleuthing on this :male_detective:

1 Like