Segfault in auto-tuning tutorial - tune_relay_x86.py

Hi, I’m running xgb tuner and my tuning process terminates by simply saying segfault.

Currently, I’m executing this tutorial code at two different CPUs with almost identical software stack. And oddly, only machine with Intel® Xeon® CPU E5-2430 0 @ 2.20GHz complains segfault. The other one works beautifully.

Does the CPU type matter with the xgb tuner given that it adopts model-based cost prediction?

Any thoughts or suggestion will be greatly helpful. Thanks!

Can you provide more detailed error information? You can print the stack with gdb (bt).

Here is the error info from gdb.

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x00007fff9be25e37 in std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > xgboost::XGBoostParameter<xgboost::GenericParameter>::UpdateAllowUnknown<std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > >(std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > const&, bool*) ()
   from /home/sunggg/.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so

+ stack info

#0  0x00007fff9be25e37 in std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > xgboost::XGBoostParameter<xgboost::GenericParameter>::UpdateAllowUnknown<std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > >(std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > const&, bool*) ()
   from /home/sunggg/.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so
#1  0x00007fff9be130b7 in xgboost::GenericParameter::ConfigureGpuId(bool) () from /home/sunggg/.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so
#2  0x00007fff9be2ecd1 in xgboost::LearnerImpl::Configure() () from /home/sunggg/.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so
#3  0x00007fff9be2969d in xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*) ()
   from /home/sunggg/.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so
#4  0x00007fff9bd2c639 in XGBoosterUpdateOneIter () from /home/sunggg/.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so
#5  0x00007fffba0c4dae in ffi_call_unix64 () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#6  0x00007fffba0c471f in ffi_call () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#7  0x00007fffba2d85c4 in _ctypes_callproc () from /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so
#8  0x00007fffba2d8c33 in ?? () from /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so
#9  0x00000000005aa6ec in _PyObject_FastCallKeywords ()
#10 0x000000000050abb3 in ?? ()
#11 0x000000000050c5b9 in _PyEval_EvalFrameDefault ()
#12 0x0000000000508245 in ?? ()
#13 0x000000000050a080 in ?? ()
#14 0x000000000050aa7d in ?? ()
#15 0x000000000050c5b9 in _PyEval_EvalFrameDefault ()
#16 0x0000000000508245 in ?? ()
#17 0x000000000050a080 in ?? ()
#18 0x000000000050aa7d in ?? ()
#19 0x000000000050d390 in _PyEval_EvalFrameDefault ()
#20 0x0000000000508245 in ?? ()
#21 0x000000000050a080 in ?? ()
#22 0x000000000050aa7d in ?? ()
#23 0x000000000050d390 in _PyEval_EvalFrameDefault ()
#24 0x0000000000509d48 in ?? ()
#25 0x000000000050aa7d in ?? ()
#26 0x000000000050c5b9 in _PyEval_EvalFrameDefault ()
#27 0x0000000000508245 in ?? ()
#28 0x000000000050a080 in ?? ()
#29 0x000000000050aa7d in ?? ()
#30 0x000000000050c5b9 in _PyEval_EvalFrameDefault ()
#31 0x0000000000508245 in ?? ()
#32 0x0000000000509642 in _PyFunction_FastCallDict ()
#33 0x0000000000595311 in ?? ()
#34 0x00000000005a067e in PyObject_Call ()
#35 0x000000000050d966 in _PyEval_EvalFrameDefault ()
#36 0x0000000000508245 in ?? ()
#37 0x000000000050a080 in ?? ()
#38 0x000000000050aa7d in ?? ()
#39 0x000000000050d390 in _PyEval_EvalFrameDefault ()
#40 0x0000000000508245 in ?? ()
#41 0x000000000058958c in ?? ()
#42 0x00000000005a067e in PyObject_Call ()
#43 0x000000000050d966 in _PyEval_EvalFrameDefault ()
#44 0x0000000000509d48 in ?? ()
#45 0x000000000050aa7d in ?? ()
#46 0x000000000050c5b9 in _PyEval_EvalFrameDefault ()
#47 0x0000000000508245 in ?? ()
#48 0x000000000050b403 in PyEval_EvalCode ()
#49 0x0000000000635222 in ?? ()
#50 0x00000000006352d7 in PyRun_FileExFlags ()
#51 0x0000000000638a8f in PyRun_SimpleFileExFlags ()
#52 0x0000000000639631 in Py_Main ()
#53 0x00000000004b0f40 in main ()

Even after the re-installation of xgboost and its dependencies, I’m stuck with the same error. FYI, installation of xgboost completes successfully w/o any issue.

show your code, Do you modify the official tutorial?

And you can use xgboost to see if there is a problem.

I only changed the tuner to ‘xgb’ from the official tutorial. That’s all.

Can you please elaborate your suggestion? I assumed xgb tuner relies on xgboost library already.

Hi all!

I’m experiencing the very same issue. Stock tutorial, setting the tuner to xgb.

I have built TVM with debug info and executed the whole stack with gdb, but as one can see from the backtrace of @sunggg, the segfault arises actually in libxgboost.so , not TVM itself.

Looking into my python virtual environment here is my xgboost metadata:

Metadata-Version: 2.1
Name: xgboost
Version: 1.0.1
Summary: XGBoost Python Package
Home-page: https://github.com/dmlc/xgboost
Maintainer: Hyunsu Cho
Maintainer-email: chohyu01@cs.washington.edu
License: Apache-2.0
Platform: UNKNOWN
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Development Status :: 5 - Production/Stable
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.5
Requires-Dist: numpy
Requires-Dist: scipy
Provides-Extra: dask
Requires-Dist: dask ; extra == 'dask'
Requires-Dist: pandas ; extra == 'dask'
Requires-Dist: distributed ; extra == 'dask'
Provides-Extra: datatable
Requires-Dist: datatable ; extra == 'datatable'
Provides-Extra: pandas
Requires-Dist: pandas ; extra == 'pandas'
Provides-Extra: plotting
Requires-Dist: graphviz ; extra == 'plotting'
Requires-Dist: matplotlib ; extra == 'plotting'
Provides-Extra: sklearn
Requires-Dist: sklearn ; extra == 'sklearn'

And my CPU info: Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz

What goes wrong? @sunggg do you have the same xgboost library version? Could it be a bug in the library? If so, we could try with another version. Or is the library somehow misused from TVM?

Best regards, Robert

Hi, @robeastbme. I have xgboost (1.0.2) and it still does not work.

I tried this xgboot and its working for me and I have installed v.1.0.2, I have changed the only tuner in the example and its working. Maybe something wrong with the installation

Hi everyone:

The cause for segmentation fault in the xgb auto tuner has been fixed: https://github.com/apache/incubator-tvm/issues/4953

The upcoming release of XGBoost (1.1.0) will have the fix. You can try the release candidate today at https://github.com/dmlc/xgboost/issues/5593.

1 Like