Autotuning cross-ARM64 with xgboost crashed

junjiec · May 19, 2020, 3:43am

Hi, I followed the tutorial https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_arm.html#sphx-glr-tutorials-autotvm-tune-relay-arm-py but after few seconds, it crashed.

I coredumped it, and here is the backtrace:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f1e60ca1e37 in std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > xgboost::XGBoostParameter<xgboost::GenericParameter>::UpdateAllowUnknown<std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > >(std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > const&, bool*) () from /home/junjie/.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so
[Current thread is 1 (Thread 0x7f1e84dcd740 (LWP 11461))]
(gdb) bt
#0  0x00007f1e60ca1e37 in std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > xgboost::XGBoostParameter<xgboost::GenericParameter>::UpdateAllowUnknown<std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > >(std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > const&, bool*) () from /home/junjie/.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so
#1  0x00007f1e60c8f0b7 in xgboost::GenericParameter::ConfigureGpuId(bool) () from /home/junjie/.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so
#2  0x00007f1e60caacd1 in xgboost::LearnerImpl::Configure() () from /home/junjie/.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so
#3  0x00007f1e60ca569d in xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*) () from /home/junjie/.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so
#4  0x00007f1e60ba8639 in XGBoosterUpdateOneIter () from /home/junjie/.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so
#5  0x00007f1e7864ddae in ffi_call_unix64 () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#6  0x00007f1e7864d71f in ffi_call () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#7  0x00007f1e788615c4 in _ctypes_callproc () from /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so
#8  0x00007f1e78861c33 in ?? () from /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so
#9  0x00000000005a9cbc in _PyObject_FastCallKeywords ()

I am using the master branch of github incubator-tvm:

commit 63f84a11353791ac3f8916cdcf7c2c6e6d45c4fb (HEAD -> master, origin/master, origin/HEAD)
Author: Thierry Moreau <tmoreau@octoml.ai>
Date:   Sat May 16 09:08:59 2020 -0700

    fix rpc server bug on VTA (#5607)

I changed the tuning method to _random_, it seems fine until now(not yet finished). Any comments?

junjiec · May 22, 2020, 10:57am

After some debugging, I found following code can reproduce under:

x86 PC(ubuntu 18.04) : crash
Docker in MAC 10.15 (ubuntu 18.04): crash
MAC 10.15 : no problem
ARM Board with Ubuntu 18.04 : no problem

import tvm

import xgboost as xgb
dtrain2 = xgb.DMatrix(np.zeros((8,679)), np.zeros((8,)))
xgb_params = {
        'max_depth': 3,
        'gamma': 0.0001,
        'min_child_weight': 1,

        'subsample': 1.0,

        'eta': 0.3,
        'lambda': 1.00,
        'alpha': 0,

        'objective': 'rank:pairwise',
        }
bst = xgb.train(xgb_params, dtrain2)

And the wired thing is, if you don’t import tvm before xgboost, it works fine. Then, I tried to rebuild xgboost with debug symbol, but the issue disappeared … I think I will just move on temporarily.