TL;DR The current FindCUDA.cmake
might probably find a wrong libcuda.so
when there are multiple CUDA versions exist, even if USE_CUDA
is set properly. This could cause building TVM to fail.
The issue.
# There is a default libcuda under `/usr/lib64/`
$ ll /usr/lib64/ | grep libcuda.so
lrwxrwxrwx 1 root root 12 Apr 17 15:21 libcuda.so -> libcuda.so.1
lrwxrwxrwx 1 root root 17 Apr 17 15:21 libcuda.so.1 -> libcuda.so.390.48
-rwxr-xr-x 1 root root 10033592 Apr 17 15:21 libcuda.so.390.48
# Another libcuda.so under `/opt/cuda/9.0`
$ ll /opt/cuda/9.0/lib64/stubs/ | grep libcuda.so
-rwxr-xr-x 1 root root 42176 Feb 6 2018 libcuda.so
# `USE_CUDA` is set properly
$ cat config.cmake | grep USE_CUDA
set(USE_CUDA /opt/cuda/9.0/)
# And `LD_LIBRARY_PATH` and ldconfig are correct
$ ldconfig -p | grep libcuda.so
libcuda.so.1 (libc6,x86-64) => /opt/cuda/9.0/lib64/stubs/libcuda.so.1
$ echo $LD_LIBRARY_PATH
/opt/cuda/9.0/lib64/stubs/:/opt/cuda/9.0/lib64
In this case, if we print out all CUDA-related variables in FindCUDA.cmake
, we get
-- Custom CUDA_PATH=/opt/cuda/9.0/
-- CUDA_FOUND=TRUE
-- CUDA_INCLUDE_DIRS=/opt/cuda/9.0//include
-- CUDA_TOOLKIT_ROOT_DIR=/opt/cuda/9.0/
-- CUDA_CUDA_LIBRARY=/usr/lib64/libcuda.so ######### Incorrect #########
-- CUDA_CUDART_LIBRARY=/opt/cuda/9.0/lib64/libcudart.so
-- CUDA_NVRTC_LIBRARY=/opt/cuda/9.0/lib64/libnvrtc.so
-- CUDA_CUDNN_LIBRARY=CUDA_CUDNN_LIBRARY-NOTFOUND
-- CUDA_CUBLAS_LIBRARY=/opt/cuda/9.0/lib64/libcublas.so
It causes a link-time error when building TVM:
/my/own/ld: cannot find -lcuda
collect2: error: ld returned 1 exit status
make[2]: *** [libtvm_runtime.so] Error 1
make[1]: *** [CMakeFiles/tvm_runtime.dir/all] Error 2
Cause of the issue.
FindCUDA
prefers /usr/lib64
, which is implicitly included, so that the customized path is ignored in the following cmake command:
find_library(_CUDA_CUDA_LIBRARY cuda
PATHS ${CUDA_TOOLKIT_ROOT_DIR}
PATH_SUFFIXES lib lib64 targets/x86_64-linux/lib targets/x86_64-linux/lib/stubs)
Proposed solution.
When a customized USE_CUDA
is provided, I suggest do not including the default paths into the search path, which should look like:
find_library(_CUDA_CUDA_LIBRARY cuda
PATHS ${CUDA_TOOLKIT_ROOT_DIR}
PATH_SUFFIXES lib lib64 lib64/stubs targets/x86_64-linux/lib targets/x86_64-linux/lib/stubs
NO_DEFAULT_PATH) ###### disable default path here ######
Discuss.
Many university clusters have multiple CUDA versions installed, so it might become a little issue some time…I don’t think it is a big deal though…
And why not we print out all those paths found by FindXXXX.cmake? For example, when running cmake ..
to generate the Makefile, it could print the information below so that the users would be aware which dependencies are used.
# For FindCUDA.cmake
- CUDA_FOUND=
- CUDA_INCLUDE_DIRS=
- CUDA_TOOLKIT_ROOT_DIR=
- CUDA_CUDA_LIBRARY=
- CUDA_CUDART_LIBRARY=
- CUDA_NVRTC_LIBRARY=
- CUDA_CUDNN_LIBRARY=
- CUDA_CUBLAS_LIBRARY=
# For FindLLVM.cmake
- LLVM_INCLUDE_DIRS=
- LLVM_LIBS=
- LLVM_DEFINITIONS=
- TVM_LLVM_VERISON=