Issue description
Inference on NVIDIA Tesla T4 with GluonCV model mobilenetv2_1.0
auto-tuned with set_cuda_target_arch('sm_75')
for batch size 10 and compiled at opt_level=1
fails with CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES.
Steps to reproduce the issue
- Prepare hardware and environment that meet the requirements for TVM auto-tuning on an NVIDIA Tesla T4
- Set target architecture to that of Tesla T4 by executing
tvm.autotvm.measure.measure_methods.set_cuda_target_arch('sm_75')
before auto-tuning - Execute auto-tuning for batch size 10 of the GluonCV 0.7.0 classification model
mobilenetv2_1.0
according to the tutorial for NVIDIA GPU (https://docs.tvm.ai/tutorials/autotvm/tune_relay_cuda.html
), in the environment prepared in step 1, with target architecture set as in step 2 - Compile the tuned model at
opt_level=1
- Execute inference with the tuned and compiled model on batches of size 10 of COCO image data
What’s the expected result?
- Inference succeeds without errors
What’s the actual result?
- Inference fails with CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
Additional details
- Inference succeeds if auto-tuning the model with
set_cuda_target_arch('sm_62')
instead ofset_cuda_target_arch('sm_75')
- “sm_62" signifies an architecture different from that of Tesla T4
- The correct target architecture for Tesla T4 is “sm_75”, so tuning, compilation and inference should all succeed with this setting
- Possibly related discussions: [CUDA]Got Error: CUDA ERROR LAUNCH OUT OF RESOURCES and Got error on Jetson TX2 with resnet50_v2 CUDA OUT_OF_RESOURCES
Suggested solutions
- Fix TVM so that the correct target architecture setting yields expected results