[Solved] Nvrtc: error: invalid value for --gpu-architecture

NienfengYao · May 13, 2019, 7:18am

Dear all,
I am a new to study tvm. I flow the tutorial: “Get Started with Tensor Expression” and got the error message:

(base) ryanyao@umihuang-HP-EliteDesk-800-G1-TWR:~/docker_mount/tvm_ex$ python tensor_expr_get_started.py
<class ‘tvm.tensor.Tensor’>
terminate called after throwing an instance of ‘dmlc::Error’
what(): [17:27:23] /home/ryanyao/tvm/src/codegen/opt/build_cuda_on.cc:118: Check failed: compile_res == NVRTC_SUCCESS (5 vs. 0) : nvrtc: error: invalid value for --gpu-architecture (-arch)

Aborted (core dumped)

Can someone give me hint how to solve it?
Thank you.

NienfengYao · May 13, 2019, 7:17am

I have solved it. I think there were something wrong during I build the tvm. So I rebuild it and the problem is gone.

drord9 · August 13, 2019, 10:44am

Hi,

I’m getting the same error wilr taying to run tensor_expr_get_started.py

Traceback (most recent call last):

  File "tensor_expr_get_started.py", line 139, in <module>
    fadd = tvm.build(s, [A, B, C], tgt, target_host=tgt_host, name="myadd")

  File "/home/drorca/.local/lib/python3.5/site-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/build_module.py", line 621, in build
    fhost, mdev = _build_for_device(flist, tar, target_host)

  File "/home/drorca/.local/lib/python3.5/site-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/build_module.py", line 488, in _build_for_device
    mdev = codegen.build_module(fdevice, str(target)) if fdevice else None

  File "/home/drorca/.local/lib/python3.5/site-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/codegen.py", line 36, in build_module
    return _Build(lowered_func, target)

  File "/home/drorca/.local/lib/python3.5/site-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/_ffi/_ctypes/function.py", line 210, in __call__
    raise get_last_ffi_error()

tvm._ffi.base.TVMError: Traceback (most recent call last):
  File "/home/drorca/dror/tvm/tvm/src/codegen/opt/build_cuda_on.cc", line 119
nvrtc: Check failed: compile_res == NVRTC_SUCCESS (5 vs. 0) : error: invalid value for --gpu-architecture (-arch)

Maximilianxu · October 5, 2019, 7:04am

I encountered a similar error as the above. Do you rememeber how to solve this issue?

NienfengYao · October 5, 2019, 7:32am

Remove tvm and get new tvm source to build it. Because my root cause is that there are some problme (assing target) during building the tvm previously.

Maximilianxu · October 5, 2019, 7:55am

I just tried downloading new source and rebuilding it, however, it didn’t work for my case:

nvcc fatal   : Value 'sm_75' is not defined for option 'gpu-architecture'

Thanks anyway.

jonso · October 6, 2019, 1:39am

This error is coming from nvcc. Do you have multiple versions of nvcc on your machine? Can you make sure that the correct one is first on the path?

Maximilianxu · October 7, 2019, 8:28am

The nvcc -V reports a version as 9.0.176, but the nvidia-smi reports a version 10.1. My tensorflow version is 1.12.0. The directory of /usr/local/ includes the following files:

bin  cuda  cuda-9.0  etc  games  include  lib  man  sbin  share  src

The path shows

/home/max/miniconda3/bin:/home/max/miniconda3/condabin:/usr/local/cuda-9.0/bin:/usr/local/cuda-9.0/bin:/usr/local/cuda-9.0/bin:/home/max/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/local/cuda-10.1/bin:/snap/bin:/home/max/miniconda3/bin:/home/max/miniconda3/bin:/home/max/miniconda3/bin:/usr/local/cuda-9.0/bin:/home/max/miniconda3/bin:/usr/local/cuda-9.0/bin:/home/max/miniconda3/bin:/usr/local/cuda-9.0/bin:/usr/local/cuda-10.1/bin:/home/max/miniconda3/bin/:/usr/local/cuda-9.1/bin:/home/max/miniconda3/bin/

Is the 10.1 the source of the issue?

jonso · October 7, 2019, 10:17pm

It looks like it. CUDA 10.1 works fine for me. I would try installing CUDA 10.1 from scratch.

Maximilianxu · October 8, 2019, 1:23am

Thanks for your reply.

My GPU is GTX 1660 Ti which is not supported by nvidia-384, nvidia-driver-390 that uses cuda 9.x as their runtime. I indeed tried installing the following drivers nvidia-driver-418, nvidia-driver-430, nvidia-driver-435 which all uses cuda 10.x, but the same issue emerged.

Maximilianxu · October 9, 2019, 1:13pm

Finally, finally, I solved this issue. Although I have no idea why this issue happened.

I use nvidia-driver-430 as the driver, and install cuda-10.0 and cudnn-7.6.4 for 10.0, changed the tensorflow version from 1.12.0 to 1.13.1.

Then the error was gone.