[ERROR]FP16 CUDA compilation error

I have some problems when compiling a float16 ResNet18 and inception_v3 model.
RuntimeError: Compilation error:
/tmp/tmpoiw0p21t/my_kernel.cu(2207): warning: attribute “shared” does not apply here

/tmp/tmpoiw0p21t/my_kernel.cu(2207): error: no operator "=" matches these operands
            operand types are: volatile half = half

/tmp/tmpoiw0p21t/my_kernel.cu(2210): warning: attribute "shared" does not apply here

It seems that I should modify the CodeGenCUDA::Finish() fun. What should I do?
@vinx13 @ibeltagy @hhhh @xyzhou @ydy @tqchen @comaniac
Thank you

I got kind of related errors yesterday, and I think the issue here [ERROR] Half type support in NVRTC is related. For now, I just use an old commit until the nvrtc issues are fixed.

    diagonaled_mm = tvm.build(s, [X, Y, Z, D, w, w_upper, padding, transpose_t1, t3d3], target=device, target_host=tgt_host, name='diagonaled_mm')                                                                                                                                    
/usr/tvm/python/tvm/build_module.py:636: in build                                                                                                                                                                                                                                     
    fhost, mdev = _build_for_device(flist, tar, target_host)                                                                                                                                                                                                                          
/usr/tvm/python/tvm/build_module.py:502: in _build_for_device                                                                                                                                                                                                                         
    mdev = codegen.build_module(fdevice, str(target)) if fdevice else None                                                                                                                                                                                                            
/usr/tvm/python/tvm/codegen.py:36: in build_module                                                                                                                                                                                                                                    
    return _Build(lowered_func, target)                                                                                                                                                                                                                                               
/usr/tvm/python/tvm/_ffi/_ctypes/function.py:207: in __call__                                                                                                                                                                                                                         
    raise get_last_ffi_error()                                                                                                                                                                                                                                                        
E   tvm._ffi.base.TVMError: Traceback (most recent call last):                                                                                                                                                                                                                        
E               operand types are: half * half                                                                                                                                                                                                                                        
E               function "operator*(__half, __half)"                                                                                                                                                                                                                                  
E               function "operator*(const __half &, const __half &)"                                                                                                                                                                                                                  
E     File "/usr/tvm/src/codegen/opt/build_cuda_on.cc", line 119                                                                                                                                                                                                                      
E   TVMError: Check failed: compile_res == NVRTC_SUCCESS (6 vs. 0) : default_program(31): error: more than one operator "*" matches these operands:                                                                                                                                   
E   default_program(31): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "half", and vice versa                                                                                                                          
E   default_program(31): error: more than one operator "*" matches these operands:                                                                                                                                                                                                    
E               function "operator*(const __half &, const __half &)"                                                                                                                                                                                                                  
E               function "operator*(__half, __half)"                                                                                                                                                                                                                                  
E               operand types are: half * half                                                                                                                                                                                                                                        
E   default_program(31): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "half", and vice versa                                                                                                                          
E   default_program(31): error: more than one operator "*" matches these operands:                                                                                                                                                                                                    
E               function "operator*(const __half &, const __half &)"                                                                                                                                                                                                                  
E               function "operator*(__half, __half)"                                                                                                                                                                                                                                  
E               operand types are: half * half                                                                                                                                                                                                                                        
E   default_program(31): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "half", and vice versa                                                                                                                          
E   default_program(46): error: more than one operator "*" matches these operands:                                                                                                                                                                                                    
E               function "operator*(const __half &, const __half &)"                                                                                                                                                                                                                  
E               function "operator*(__half, __half)"                                                                                                                                                                                                                                  
E               operand types are: half * half                                                                                                                                                                                                                                        
E   default_program(46): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "half", and vice versa                                                                                                                          
E   default_program(46): error: more than one operator "*" matches these operands:                                                                                                                                                                                                    
E               function "operator*(const __half &, const __half &)"                                                                                                                                                                                                                  
E               function "operator*(__half, __half)"                                                                                                                                                                                                                                  
E               operand types are: half * half
E   default_program(46): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "half", and vice versa
E   default_program(46): error: more than one operator "*" matches these operands:
E               function "operator*(const __half &, const __half &)"
E               function "operator*(__half, __half)"
E               operand types are: half * half
E   default_program(46): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "half", and vice versa
E   12 errors detected in the compilation of "default_program".
1 Like

Thanks for your replay
What’s the old commit that you used now?

I found that this is softmax op raise this issuse.

I use a month old commit just because that’s when I started the project, but I am sure more recent commits are working as well.

@vinx13 @Hzfengsy can you look a bit into the issue?

Do you still have the error after https://github.com/apache/incubator-tvm/pull/4239

yes, the error I poster earlier is after that PR.

Can you provide some reproducible code? So that we can catch the error easily.

Yes,If I use this commit, I will have the errors which like ibeltagy said

           default_program(46): error: more than one operator "*" matches these operands:                                                                                                                                                                                                    
                     function "operator*(const __half &, const __half &)"                                                                                                                                                                                                                  
                     function "operator*(__half, __half)"                                                                                                                                                                                                                                  
                     operand types are: half * half

okay, i will look into it

can you try if removing these lines work for you?

def test_downcast_fp16_resnet():
    image_shape = (3,224,224)
    net, params = resnet.get_workload(image_shape=image_shape, dtype='float32')
    func = relay.neo.downcast_fp16(net['main'])
    with relay.build_config(opt_level=3):
        graph_fp16, lib_fp16, params_fp16 = relay.build(func, params=params, target='cuda')
    rt_fp16 = tvm.contrib.graph_runtime.create(graph_fp16, lib_fp16, tvm.gpu())
    for i in range(100):
        X = tvm.nd.array(np.random.random_sample(image_shape).astype('float32'))
        rt_fp16.set_input('data', X)
        out_fp16 = rt_fp16.get_output(0).asnumpy()

if __name__ == "__main__":

Thank for your replay. this code can raise these errors

For arch >= 5.3, arithmetic operators are supported. We shouldn’t add these overloaded operators. @xyzhou

yes, I removed these lines . errors that like like ibeltagy said would not appear again。 But the errosr I made first will appear。

/tmp/tmpoiw0p21t/my_kernel.cu(2207): error: no operator "=" matches these operands
                operand types are: volatile half = half

/tmp/tmpoiw0p21t/my_kernel.cu(2210): warning: attribute "shared" does not apply here

I check the cuda code that generated by TVM, I found that the errors raised by fused_nn_max_pool2d_kernel0 fun。and this line raise this error :
((volatile __shared__ half*)red_buf0)[((int)threadIdx.x)] = tensor_rf[0];

which cuda arch are you using?

Tesla P100

Thank you

can you try CUDA >= 9.2
seems volatile half assignment hasn’t been supported until cuda 9.2

Thanks! I was having the same issues as @wda’s , I will check my cuda version as soon as possible.

After merging the latest tvm, I got:

/tmp/tmpc2ksk9tx/my_kernel.cu(250): error: more than one operator “" matches these operands:
function "operator
(const __half &, const __half &)”
function “operator*(__half, __half)”
operand types are: half * half

After removing L65-L67 as you mentioned above, I got:

/tmp/tmp615xi8fe/my_kernel.cu(9068): warning: attribute “shared” does not apply here
/tmp/tmp615xi8fe/my_kernel.cu(9068): warning: attribute “shared” does not apply here
/tmp/tmp615xi8fe/my_kernel.cu(9068): warning: attribute “shared” does not apply here
/tmp/tmp615xi8fe/my_kernel.cu(9068): error: no operator “+” matches these operands
operand types are: volatile half + volatile half
/tmp/tmp615xi8fe/my_kernel.cu(9072): warning: attribute “shared” does not apply here
12 errors detected in the compilation of “/tmp/tmpxft_00003542_00000000-6_my_kernel.cpp1.ii”.

My CUDA version is 10.1 and device is Telsa V100, sm_70, testing script is:

import tvm
from tvm import relay
from tvm.relay import testing

target = 'cuda'

mod, params = testing.resnet.get_workload(dtype='float16')
with relay.build_config(opt_level=3):
    graph, lib, params = relay.build(mod, params=params, target=target)