Recently I am auto-tuning faster-rcnn from GluonCV in auto-tvm
My setting is:
- RTX2080Ti
- Centos 7.5.1804
- Cuda 10 with cudnn 7.4
However,during the tuning process,error happens as follows:
what(): [23:13:20] /home/liuxin/3rdparty/source_code/tvm-1/src/runtime/cuda/cuda_module.cc:61: CUDAError: cuModuleUnload(module_[i]) failed with error: CUDA_ERROR_ILLEGAL_ADDRESS
Stack trace: [bt] (0) /home/liuxin/3rdparty/source_code/tvm-1/build/libtvm.so(+0x7ec952) [0x7fa22d678952]
[bt] (1) /home/liuxin/3rdparty/source_code/tvm-1/build/libtvm.so(+0xf1eee8) [0x7fa22ddaaee8]
[bt] (2) /home/liuxin/3rdparty/source_code/tvm-1/build/libtvm.so(+0xec1954) [0x7fa22dd4d954]
[bt] (3) /home/liuxin/3rdparty/source_code/tvm-1/build/libtvm.so(+0xecd062) [0x7fa22dd59062]
[bt] (4) /home/liuxin/3rdparty/source_code/tvm-1/build/libtvm.so(+0xef49e0) [0x7fa22dd809e0]
[bt] (5) /home/liuxin/3rdparty/source_code/tvm-1/build/libtvm.so(+0xef4cc0) [0x7fa22dd80cc0]
[bt] (6) /home/liuxin/3rdparty/source_code/tvm-1/build/libtvm.so(+0xef9df8) [0x7fa22dd85df8]
[bt] (7) /home/liuxin/3rdparty/source_code/tvm-1/build/libtvm.so(+0xef6ba7) [0x7fa22dd82ba7]
[bt] (8) /home/liuxin/3rdparty/source_code/tvm-1/build/libtvm.so(+0xefab5d) [0x7fa22dd86b5d]
I have checked the error code line but I don’t know how the error happens.Anyone knows this ?
In addition,it happens in the process.To reproduce the error,I have to wait a long time , how to avoid this happening?