Inference on NVIDIA Tesla T4 with GluonCV model
mobilenetv2_1.0 auto-tuned with
set_cuda_target_arch('sm_75') for batch size 10 and compiled at
opt_level=1 fails with CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES.
Steps to reproduce the issue
- Prepare hardware and environment that meet the requirements for TVM auto-tuning on an NVIDIA Tesla T4
- Set target architecture to that of Tesla T4 by executing
- Execute auto-tuning for batch size 10 of the GluonCV 0.7.0 classification model
mobilenetv2_1.0according to the tutorial for NVIDIA GPU (
https://docs.tvm.ai/tutorials/autotvm/tune_relay_cuda.html), in the environment prepared in step 1, with target architecture set as in step 2
- Compile the tuned model at
- Execute inference with the tuned and compiled model on batches of size 10 of COCO image data
What’s the expected result?
- Inference succeeds without errors
What’s the actual result?
- Inference fails with CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
- Inference succeeds if auto-tuning the model with
- “sm_62" signifies an architecture different from that of Tesla T4
- The correct target architecture for Tesla T4 is “sm_75”, so tuning, compilation and inference should all succeed with this setting
- Possibly related discussions: [CUDA]Got Error: CUDA ERROR LAUNCH OUT OF RESOURCES and Got error on Jetson TX2 with resnet50_v2 CUDA OUT_OF_RESOURCES
- Fix TVM so that the correct target architecture setting yields expected results