Destruct graph module after main() will cause segment fault

I need to register some models to a global static registry for serving, but meet a segment fault error. Here is a minimal reproducible example:

#include <fstream>
#include <string>
#include <iostream>

#include <dlpack/dlpack.h>
#include <tvm/runtime/module.h>
#include <tvm/runtime/packed_func.h>
#include <tvm/runtime/registry.h>

tvm::runtime::Module load_model(const std::string &modelFolder,
                                const int deviceType, const int deviceID)
{
    // load graph structure
    std::ifstream graphStream(modelFolder + "/deploy_graph.json", std::ios::in);
    std::string graph((std::istreambuf_iterator<char>(graphStream)),
                      std::istreambuf_iterator<char>());
    graphStream.close();

    // load module op library
    tvm::runtime::Module lib =
        tvm::runtime::Module::LoadFromFile(modelFolder + "/deploy_lib.so");

    // create graph runtime
    tvm::runtime::Module model =
        (*tvm::runtime::Registry::Get("tvm.graph_runtime.create"))(
            graph, lib, deviceType, deviceID);

    // load parameters
    std::ifstream paramsStream(modelFolder + "/deploy_param.params",
                               std::ios::binary);
    std::string paramsData((std::istreambuf_iterator<char>(paramsStream)),
                           std::istreambuf_iterator<char>());
    paramsStream.close();

    // parameters need to be TVMByteArray type to indicate the binary data
    TVMByteArray paramsArr;
    paramsArr.data = paramsData.c_str();
    paramsArr.size = paramsData.length();

    // get the function from the module(load patameters)
    tvm::runtime::PackedFunc load_params_func =
        model.GetFunction("load_params");
    load_params_func(paramsArr);

    return model;
}

class A
{
   public:
    A(const std::string &modelFolder, const int deviceType, const int deviceID)
    {
        model_ = load_model(modelFolder, deviceType, deviceID);
    }

   private:
    tvm::runtime::Module model_;
};

A *pA = nullptr;

int main(int argc, char **argv)
{
    pA = new A(argv[1], (int)kDLCPU, 0);

    return 0;
}

__attribute__((destructor)) void after_main()
{
    if (nullptr != pA)
    {
        std::cout << "delete pA" << std::endl;
        delete pA;
        std::cout << "done" << std::endl;
    }
}

I can release all models at the end of main() to work around this. But I am interested in the reason, because it is ok to do the same thing with libtorch and tensorflow.

Appreciate to any feedback.

GDB bt show:
#0 0x00007fa1d3d6e92e in tvm::runtime::NDArray::Internal::DefaultDeleter(tvm::runtime::NDArray::Container*) ()
from ./bundle_lib/libtvm_runtime.so
(gdb) bt
#0 0x00007fa1d3d6e92e in tvm::runtime::NDArray::Internal::DefaultDeleter(tvm::runtime::NDArray::Container*) ()
from ./bundle_lib/libtvm_runtime.so
#1 0x00007fa1d3dbfa27 in tvm::runtime::GraphRuntime::~GraphRuntime() () from ./bundle_lib/libtvm_runtime.so
#2 0x0000000000403eda in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x18847d0)
at /usr/include/c++/4.9/bits/shared_ptr_base.h:149
#3 0x0000000000402538 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x186f3a8,
__in_chrg=) at /usr/include/c++/4.9/bits/shared_ptr_base.h:666
#4 std::__shared_ptr<tvm::runtime::ModuleNode, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x186f3a0,
__in_chrg=) at /usr/include/c++/4.9/bits/shared_ptr_base.h:914
#5 std::shared_ptrtvm::runtime::ModuleNode::~shared_ptr (this=0x186f3a0, __in_chrg=)
at /usr/include/c++/4.9/bits/shared_ptr.h:93
#6 tvm::runtime::Module::~Module (this=0x186f3a0, __in_chrg=)
at risk_control/tvm_runtime/include/tvm/runtime/module.h:46
#7 A::~A (this=0x186f3a0, __in_chrg=) at vil/examples/src/debug.cpp:56
#8 after_main () at vil/examples/src/debug.cpp:82
#9 0x00007fa1d467e1fa in ?? () from ./bundle_lib/ld-linux-x86-64.so.2
#10 0x00007fa1b8b9cb29 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#11 0x00007fa1b8b9cb75 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x00007fa1b8b86b4c in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#13 0x00000000004026c4 in _start ()

the main reason could due to the use of global singleton(which might get destructed early due to different static destruction order) for DeviceApi in the runtime. Similar things might happen as well for other related runtimes(eg cuda)

To resolve this problem, we would have to modify the device api to allow NDArray to retain strong reference to it, and thus prevent the deallocation from happening before all NDArray get deallocated.

The simplest solution though is to unload module explicitly in the main funciton

1 Like

Thank you very much.