TVM CUDA codegen and Tensor Core tutorial

Hi:

I am learning TVM for GPU backend. I have a question after reading code_gen. It seems that the workflow of GPU backend is that TVM first generates CUDAcode and then call NVRTC to generate PTX code directly. However, for the tutorial of Tensor Core(https://docs.tvm.ai/tutorials/optimize/opt_conv_tensorcore.html). TVM will use NVCC to compile CUDA code instead of use NVRTC to generate PTX code .

I am quite confused about the difference of these two workflows…

TVM will use NVRTC as default CUDA compiler. However, we can use NVCC by using the following code.

@tvm.register_func
def tvm_callback_cuda_compile(code):
    ptx = nvcc.compile_cuda(code, target="ptx")
    return ptx

NOTE: If you import topi or other package which may contain these lines of code, TVM will also use NVCC even if you do not write code explicitly.

NVCC and NVRTC are actually two different backend compiler. We can use either NVCC or NVRTC in almost every case. The PTX code generated by NVCC is usually faster than NVRTC.