What if deploy Clang/LLVM to compile the generated CUDA kernel?

I refered the code and doc which seems to suggest that TVM will compile the generated CUDA kernel to ptx by the NVCC or NVRTC, have you integrated the Clang/LLVM to compile the CUDA kernels? If so, how about the performance compared to NVCC? Since I found in some other open source project, they integrate the Clang/LLVM inside the program to compile the generated CUDA code, but as for as I know, the toolchain for CUDA from LLVM is somewhat dated and not reliable enough, I’ll appreciate if you could share some experience about it.

Yeah… Any idea about the performance difference between the NVCC and Clang/LLVM compilation for kernels?

There is already NVPTX backend which does what you say, and you can try it out