Autotvm.task.extract_from_program() get Segmentation fault

When I wrote Tuning High Performance Convolution on NVIDIA GPUs according to the official tutorial,Segmentation fault (core dumped) occurs every time this code is run; func = mod[“main”] tasks = autotvm.task.extract_from_program(func, target=target, params=params, ops=(relay.op.nn.conv2d,))

When I run it alone for inference I notice these:

[10:07:59] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v1.2.0. Attempting to upgrade… [10:07:59] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded! Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 3, 112, 112), ‘float32’), (‘TENSOR’, (64, 3, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw_winograd.cuda’, (‘TENSOR’, (1, 3, 112, 112), ‘float32’), (‘TENSOR’, (64, 3, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 64, 112, 112), ‘float32’), (‘TENSOR’, (64, 64, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw_winograd.cuda’, (‘TENSOR’, (1, 64, 112, 112), ‘float32’), (‘TENSOR’, (64, 64, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 64, 112, 112), ‘float32’), (‘TENSOR’, (64, 64, 3, 3), ‘float32’), (2, 2), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 64, 112, 112), ‘float32’), (‘TENSOR’, (64, 64, 1, 1), ‘float32’), (2, 2), (0, 0, 0, 0), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 64, 56, 56), ‘float32’), (‘TENSOR’, (64, 64, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 64, 56, 56), ‘float32’), (‘TENSOR’, (128, 64, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw_winograd.cuda’, (‘TENSOR’, (1, 64, 56, 56), ‘float32’), (‘TENSOR’, (128, 64, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 128, 28, 28), ‘float32’), (‘TENSOR’, (128, 128, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 128, 28, 28), ‘float32’), (‘TENSOR’, (256, 128, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw_winograd.cuda’, (‘TENSOR’, (1, 128, 28, 28), ‘float32’), (‘TENSOR’, (256, 128, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 256, 14, 14), ‘float32’), (‘TENSOR’, (256, 256, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 256, 14, 14), ‘float32’), (‘TENSOR’, (512, 256, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw_winograd.cuda’, (‘TENSOR’, (1, 256, 14, 14), ‘float32’), (‘TENSOR’, (512, 256, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 512, 7, 7), ‘float32’), (‘TENSOR’, (512, 512, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘dense_small_batch.cuda’, (‘TENSOR’, (1, 25088), ‘float32’), (‘TENSOR’, (512, 25088), ‘float32’), None, ‘float32’). A fallback configuration is used, which may bring great performance regression. create gragh_runtime take 3.6898140907287598 second set input and params take 0.18898606300354004 second run take 1.5326874256134033 second get_output take 4.76837158203125e-06 second (512,) dist : 1.9115334749221802

So I need to optimize these unconfigured operators?is it right? As these operators I cannot find in relay.op.nn, So I can’t use this method with autotvm.task.extract_from_program ()?So can these undefined operators be written by themselves?

Later I found the operator of the log not found above, and changed the code to the following: tasks = autotvm.task.extract_from_program(func, target=target, params=params, ops=(topi.cuda.conv2d_hwcn, topi.cuda.conv2d_nchw_winograd, topi.cuda.dense_small_batch)) when i run it,Still getting Segmentation fault (core dumped);

Then try to run it in window10 system,Made a slight change。

tasks = autotvm.task.extract_from_program(func, target=target, params=params, ops=(topi.x86.conv2d_NCHWc, topi.x86.dense_nopack))

Of course, the same model is used. I am curious how the operator name is known, and there are three operators missing on the GPU, but only two operators on the CPU.

do debug on win10: Traceback (most recent call last): [INFO] tuing … File “D:\anoconda\envs\python36\lib\threading.py”, line 916, in _bootstrap_inner self.run() File “D:\anoconda\envs\python36\lib\threading.py”, line 864, in run self._target(*self._args, **self._kwargs) Traceback (most recent call last): File “D:\anoconda\envs\python36\lib\site-packages\tvm-0.7.dev1-py3.6-win-amd64.egg\tvm\autotvm\task\relay_integration.py”, line 54, in _lower

compiler.lower(mod, target=target)

File “E:/cp/project/face_classification/tvm_submodule/tune_mxnet_cuda.py”, line 140, in File “D:\anoconda\envs\python36\lib\site-packages\tvm-0.7.dev1-py3.6-win-amd64.egg\tvm\relay\backend\vm.py”, line 134, in lower main(model_prefix, model_epoch) self._lower(mod, target, target_host)

File “D:\anoconda\envs\python36\lib\site-packages\tvm-0.7.dev1-py3.6-win-amd64.egg\tvm_ffi_ctypes\packed_func.py”, line 212, in call File “E:/cp/project/classification/tvm_submodule/tune_mxnet_cuda.py”, line 134, in main ctypes.byref(ret_val), ctypes.byref(ret_tcode)) != 0: tune_and_evaluate(tuning_option, prefix=pre, epoch=epoch) OSError: exception: stack overflow

How should I generate tasks now?

You can try to patch this PR https://github.com/apache/incubator-tvm/pull/5019. It should fix the stack overflow issue.

Thank you very much. Later I deprecated extract_from_program (), and then manually created the task by referring to the source code inside, similar to this: t1 = create (“conv2d_nchw.cuda”, args = ( (‘TENSOR’, (1, 3, 112, 112), ‘float32’), (‘TENSOR’, (64, 3, 3, 3), ‘float32’), (1, 1), (1, 1 , 1, 1), (1, 1), ‘float32’), target = target)

@a876571186 did you fix the issue? how? thx!