Tvm.build breaks with `cuda` and `float16`


#1

Running the following code snippet breaks. Changing it to float32 or to cpu works fine. Any thought what could be wrong ? Thanks

import tvm
W = 128
n = tvm.convert(16304)
m = tvm.convert(64)
c = tvm.convert(2 * W + 1)
X = tvm.placeholder((n,m), name='X', dtype='float16')
Y = tvm.placeholder((n,m), name='Y', dtype='float16')
k = tvm.reduce_axis((0, m), name='k')
Z = tvm.compute((n,c), lambda i,j :  tvm.sum(X[i,k]*Y[ i + j - W, k], axis=k), name='Z')
s = tvm.create_schedule(Z.op)
s[Z].bind(Z.op.axis[0], tvm.thread_axis("blockIdx.x"))
s[Z].bind(Z.op.axis[1], tvm.thread_axis("blockIdx.y"))
fmm = tvm.build(s, [X, Y, Z], 'cuda', target_host='llvm', name='fmm')

Error:

TVMError                                  Traceback (most recent call last)                                                                                                                                                                                                   [2/7760]
<ipython-input-4-a76d0168f7bf> in <module>
     14 s[Z].bind(Z.op.axis[0], tvm.thread_axis("blockIdx.x"))
     15 s[Z].bind(Z.op.axis[1], tvm.thread_axis("blockIdx.y"))
---> 16 fmm = tvm.build(s, [X, Y, Z], 'cuda', target_host='llvm', name='fmm')

/usr/tvm/python/tvm/build_module.py in build(inputs, args, target, target_host, name, binds)
    619     device_modules = []
    620     for tar, flist in target_flist.items():
--> 621         fhost, mdev = _build_for_device(flist, tar, target_host)
    622         # Save the current lowered functions of the host and the device module.
    623         fhost_all += fhost

/usr/tvm/python/tvm/build_module.py in _build_for_device(flist, target, target_host)
    486     fhost = [ir_pass.LowerIntrin(x, target_host.target_name) for x in fhost]
    487     fhost = [ir_pass.CombineContextCall(x) for x in fhost]
--> 488     mdev = codegen.build_module(fdevice, str(target)) if fdevice else None
    489
    490     return fhost, mdev

/usr/tvm/python/tvm/codegen.py in build_module(lowered_func, target)
     34         The corressponding module.
     35     """
---> 36     return _Build(lowered_func, target)
     37
     38 _init_api("tvm.codegen")

/usr/tvm/python/tvm/_ffi/_ctypes/function.py in __call__(self, *args)
    208                 self.handle, values, tcodes, ctypes.c_int(num_args),
    209                 ctypes.byref(ret_val), ctypes.byref(ret_tcode)) != 0:
--> 210             raise get_last_ffi_error()
    211         _ = temp_args
    212         _ = args

TVMError: Traceback (most recent call last):
            function "__half::operator __nv_bool() const"
            function "__half::operator unsigned long long() const"
            function "__half::operator long long() const"
            function "__half::operator unsigned int() const"
            function "__half::operator int() const"
            function "__half::operator unsigned short() const"
            function "__half::operator short() const"
            function "__half::operator float() const"
  File "/usr/tvm/src/codegen/opt/build_cuda_on.cc", line 119
TVMError: Check failed: compile_res == NVRTC_SUCCESS (6 vs. 0) : default_program(5): error: more than one conversion function from "half" to a built-in type applies:
default_program(5): error: more than one conversion function from "half" to a built-in type applies:
            function "__half::operator float() const"
            function "__half::operator short() const"
            function "__half::operator unsigned short() const"
            function "__half::operator int() const"
            function "__half::operator unsigned int() const"
            function "__half::operator long long() const"
            function "__half::operator unsigned long long() const"
            function "__half::operator __nv_bool() const"

2 errors detected in the compilation of "default_program".```

#2

CUDA half doesn’t have operator overloading for a few arithmetic operations. For example current codegen uses * instead of __hmul, which possibly caused this error.


#3

For example current codegen uses * instead of __hmul ,

Where do you see this in the output?

Also, is there a fix for this?


#4

It is my guess…
We need to override this https://github.com/dmlc/tvm/blob/master/src/codegen/codegen_c.cc#L475
in codegen_cuda