Destruct graph module after main() will cause segment fault

ljxing · October 19, 2019, 3:23am

I need to register some models to a global static registry for serving, but meet a segment fault error. Here is a minimal reproducible example:

#include <fstream>
#include <string>
#include <iostream>

#include <dlpack/dlpack.h>
#include <tvm/runtime/module.h>
#include <tvm/runtime/packed_func.h>
#include <tvm/runtime/registry.h>

tvm::runtime::Module load_model(const std::string &modelFolder,
                                const int deviceType, const int deviceID)
{
    // load graph structure
    std::ifstream graphStream(modelFolder + "/deploy_graph.json", std::ios::in);
    std::string graph((std::istreambuf_iterator<char>(graphStream)),
                      std::istreambuf_iterator<char>());
    graphStream.close();

    // load module op library
    tvm::runtime::Module lib =
        tvm::runtime::Module::LoadFromFile(modelFolder + "/deploy_lib.so");

    // create graph runtime
    tvm::runtime::Module model =
        (*tvm::runtime::Registry::Get("tvm.graph_runtime.create"))(
            graph, lib, deviceType, deviceID);

    // load parameters
    std::ifstream paramsStream(modelFolder + "/deploy_param.params",
                               std::ios::binary);
    std::string paramsData((std::istreambuf_iterator<char>(paramsStream)),
                           std::istreambuf_iterator<char>());
    paramsStream.close();

    // parameters need to be TVMByteArray type to indicate the binary data
    TVMByteArray paramsArr;
    paramsArr.data = paramsData.c_str();
    paramsArr.size = paramsData.length();

    // get the function from the module(load patameters)
    tvm::runtime::PackedFunc load_params_func =
        model.GetFunction("load_params");
    load_params_func(paramsArr);

    return model;
}

class A
{
   public:
    A(const std::string &modelFolder, const int deviceType, const int deviceID)
    {
        model_ = load_model(modelFolder, deviceType, deviceID);
    }

   private:
    tvm::runtime::Module model_;
};

A *pA = nullptr;

int main(int argc, char **argv)
{
    pA = new A(argv[1], (int)kDLCPU, 0);

    return 0;
}

__attribute__((destructor)) void after_main()
{
    if (nullptr != pA)
    {
        std::cout << "delete pA" << std::endl;
        delete pA;
        std::cout << "done" << std::endl;
    }
}

I can release all models at the end of main() to work around this. But I am interested in the reason, because it is ok to do the same thing with libtorch and tensorflow.

Appreciate to any feedback.

ljxing · October 19, 2019, 5:13am

GDB bt show:
#0 0x00007fa1d3d6e92e in tvm::runtime::NDArray::Internal::DefaultDeleter(tvm::runtime::NDArray::Container*) ()
from ./bundle_lib/libtvm_runtime.so
(gdb) bt
#0 0x00007fa1d3d6e92e in tvm::runtime::NDArray::Internal::DefaultDeleter(tvm::runtime::NDArray::Container*) ()
from ./bundle_lib/libtvm_runtime.so
#1 0x00007fa1d3dbfa27 in tvm::runtime::GraphRuntime::~GraphRuntime() () from ./bundle_lib/libtvm_runtime.so
#2 0x0000000000403eda in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x18847d0)
at /usr/include/c++/4.9/bits/shared_ptr_base.h:149
#3 0x0000000000402538 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x186f3a8,
__in_chrg=) at /usr/include/c++/4.9/bits/shared_ptr_base.h:666
#4 std::__shared_ptr<tvm::runtime::ModuleNode, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x186f3a0,
__in_chrg=) at /usr/include/c++/4.9/bits/shared_ptr_base.h:914
#5 std::shared_ptrtvm::runtime::ModuleNode::~shared_ptr (this=0x186f3a0, __in_chrg=)
at /usr/include/c++/4.9/bits/shared_ptr.h:93
#6 tvm::runtime::Module::~Module (this=0x186f3a0, __in_chrg=)
at risk_control/tvm_runtime/include/tvm/runtime/module.h:46
#7 A::~A (this=0x186f3a0, __in_chrg=) at vil/examples/src/debug.cpp:56
#8 after_main () at vil/examples/src/debug.cpp:82
#9 0x00007fa1d467e1fa in ?? () from ./bundle_lib/ld-linux-x86-64.so.2
#10 0x00007fa1b8b9cb29 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#11 0x00007fa1b8b9cb75 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x00007fa1b8b86b4c in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#13 0x00000000004026c4 in _start ()

tqchen · October 19, 2019, 4:55pm

the main reason could due to the use of global singleton(which might get destructed early due to different static destruction order) for DeviceApi in the runtime. Similar things might happen as well for other related runtimes(eg cuda)

To resolve this problem, we would have to modify the device api to allow NDArray to retain strong reference to it, and thus prevent the deallocation from happening before all NDArray get deallocated.

The simplest solution though is to unload module explicitly in the main funciton

ljxing · October 21, 2019, 3:38am

Thank you very much.

adb · November 18, 2020, 5:54pm

Hi @tqchen and @ljxing,

We are experiencing a similar issue while running on an Android device with the latest TVM and are looking for some insight.

The following is on behalf a teammate.

We’re having a similar situation where we’re storing the GraphRuntime module in a global structure and then deleting it later in a separate function call and triggering a segmentation fault. The segmentation fault appears to happen when the GraphRuntime destructor is called and triggers the deletion of the model library module and subsequently the dlclose(). In our case we’re keeping a persistent process active and creating/destroying the runtime/models as requests come in. Removing the call to dlclose() avoids the segmentation fault, but keeps the model library loaded in the process.

@tqchen Could you clarify what you mean by this statement?

tqchen · November 18, 2020, 11:48pm

Ideally we want to be able to destruct all the resources before the program exits(or dlcose). Explicitly calls the destructor in main, or before dlcose would resolve the problem.