Missing cuda/float16 functionalities when indexing array using another

ibeltagy · October 25, 2019, 4:19am

The code snippet below indexes a tensor using values from another tensor. This works fine on cuda/float32 but throws errors with cuda/float16. This usually means support is missing for certain float16 functions. Is this the case here? and if so, can you point me to what exactly is missing, maybe I can add it. Thanks.

import tvm
n = tvm.var("n")
m = tvm.var("m")
A = tvm.placeholder((n), name='A', dtype='float16') #  <<== 'float16'
B = tvm.placeholder((m), name='B', dtype='int')
C = tvm.compute((m,), lambda i: A[B[i]], name="C") #  <<== indexing array by values of another array 
s = tvm.create_schedule(C.op)
s[C].bind(C.op.axis[0], tvm.thread_axis("blockIdx.x"))
f = tvm.build(s, [A, B, C], 'cuda') #  <<== raises the following error

TVMError                                  Traceback (most recent call last)
<ipython-input-11-66825c43388b> in <module>
      7 s = tvm.create_schedule(C.op)
      8 s[C].bind(C.op.axis[0], tvm.thread_axis("blockIdx.x"))
----> 9 f = tvm.build(s, [A, B, C], 'cuda') #  <<== because of some missing 'float16' functionalities

/usr/tvm/python/tvm/build_module.py in build(inputs, args, target, target_host, name, binds)
    634     device_modules = []
    635     for tar, flist in target_flist.items():
--> 636         fhost, mdev = _build_for_device(flist, tar, target_host)
    637         # Save the current lowered functions of the host and the device module.
    638         fhost_all += fhost

/usr/tvm/python/tvm/build_module.py in _build_for_device(flist, target, target_host)
    500     fhost = [ir_pass.LowerIntrin(x, target_host.target_name) for x in fhost]
    501     fhost = [ir_pass.CombineContextCall(x) for x in fhost]
--> 502     mdev = codegen.build_module(fdevice, str(target)) if fdevice else None
    503 
    504     return fhost, mdev

/usr/tvm/python/tvm/codegen.py in build_module(lowered_func, target)
     34         The corressponding module.
     35     """
---> 36     return _Build(lowered_func, target)
     37 
     38 _init_api("tvm.codegen")

/usr/tvm/python/tvm/_ffi/_ctypes/function.py in __call__(self, *args)
    205                 self.handle, values, tcodes, ctypes.c_int(num_args),
    206                 ctypes.byref(ret_val), ctypes.byref(ret_tcode)) != 0:
--> 207             raise get_last_ffi_error()
    208         _ = temp_args
    209         _ = args

TVMError: Traceback (most recent call last):
  File "/usr/tvm/src/codegen/opt/build_cuda_on.cc", line 119
TVMError: Check failed: compile_res == NVRTC_SUCCESS (6 vs. 0) : default_program(12): error: class "__half_raw" has no suitable copy constructor
default_program(12): error: class "__half_raw" has no suitable copy constructor

default_program(16): error: class "__half_raw" has no suitable copy constructor

default_program(16): error: class "__half_raw" has no suitable copy constructor

default_program(20): error: class "__half_raw" has no suitable copy constructor

default_program(20): error: class "__half_raw" has no suitable copy constructor

6 errors detected in the compilation of "default_program".

ibeltagy · October 25, 2019, 4:50am

This is a duplicate of the issue discussed here.