Problems of deploying models on Jetson TX2

reminisce · September 24, 2018, 11:22pm

I tried to compile resnet50 and deploy on Jetson TX2, but got an CUDA error CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES. It seems that the shared memory allocated for some fused conv kernels are too big for this edge device. Can TVM compiler provide an option of specifying the limit of shared memory on Nvidia GPUs?

masahi · September 25, 2018, 12:10am

are you using up-to-date tvm, where cuda conv op is implemented in AutoTVM?

During auto tuning, autotvm will query a device for device resource limit (using deviceQuery), and it makes sure that the generated schedule will respect this constraint. But if you are using fallback schedule on TX2, that issue can happen. Currently available fallback schedules are tuned for GTX 1080 ti class devices, so I don’t think those parameters are appropriate for TX2.

merrymercy · September 25, 2018, 2:11am

Currently there is no such an option. I did some early experiments on TX2, but I didn’t upload the log for it. You can try this https://docs.tvm.ai/tutorials/autotvm/tune_nnvm_cuda.html#sphx-glr-tutorials-autotvm-tune-nnvm-cuda-py. (Use RPC mode for embedded devices).
You can set a small n_trial to see early results.

reminisce · September 25, 2018, 2:15am

Thanks for the instruction. My branch was rebased with the master last Thursday/Wednesday. During the compile stage, I don’t see the message of fallback to default schedules on that model.

reminisce · September 25, 2018, 2:20am

Thanks for your answer. That what I was about to try. Will circle back if encounter any problems.