The code snippet below indexes a tensor using values from another tensor. This works fine on cuda/float32 but throws errors with cuda/float16. This usually means support is missing for certain float16 functions. Is this the case here? and if so, can you point me to what exactly is missing, maybe I can add it. Thanks.
import tvm
n = tvm.var("n")
m = tvm.var("m")
A = tvm.placeholder((n), name='A', dtype='float16') # <<== 'float16'
B = tvm.placeholder((m), name='B', dtype='int')
C = tvm.compute((m,), lambda i: A[B[i]], name="C") # <<== indexing array by values of another array
s = tvm.create_schedule(C.op)
s[C].bind(C.op.axis[0], tvm.thread_axis("blockIdx.x"))
f = tvm.build(s, [A, B, C], 'cuda') # <<== raises the following error
TVMError Traceback (most recent call last)
<ipython-input-11-66825c43388b> in <module>
7 s = tvm.create_schedule(C.op)
8 s[C].bind(C.op.axis[0], tvm.thread_axis("blockIdx.x"))
----> 9 f = tvm.build(s, [A, B, C], 'cuda') # <<== because of some missing 'float16' functionalities
/usr/tvm/python/tvm/build_module.py in build(inputs, args, target, target_host, name, binds)
634 device_modules = []
635 for tar, flist in target_flist.items():
--> 636 fhost, mdev = _build_for_device(flist, tar, target_host)
637 # Save the current lowered functions of the host and the device module.
638 fhost_all += fhost
/usr/tvm/python/tvm/build_module.py in _build_for_device(flist, target, target_host)
500 fhost = [ir_pass.LowerIntrin(x, target_host.target_name) for x in fhost]
501 fhost = [ir_pass.CombineContextCall(x) for x in fhost]
--> 502 mdev = codegen.build_module(fdevice, str(target)) if fdevice else None
503
504 return fhost, mdev
/usr/tvm/python/tvm/codegen.py in build_module(lowered_func, target)
34 The corressponding module.
35 """
---> 36 return _Build(lowered_func, target)
37
38 _init_api("tvm.codegen")
/usr/tvm/python/tvm/_ffi/_ctypes/function.py in __call__(self, *args)
205 self.handle, values, tcodes, ctypes.c_int(num_args),
206 ctypes.byref(ret_val), ctypes.byref(ret_tcode)) != 0:
--> 207 raise get_last_ffi_error()
208 _ = temp_args
209 _ = args
TVMError: Traceback (most recent call last):
File "/usr/tvm/src/codegen/opt/build_cuda_on.cc", line 119
TVMError: Check failed: compile_res == NVRTC_SUCCESS (6 vs. 0) : default_program(12): error: class "__half_raw" has no suitable copy constructor
default_program(12): error: class "__half_raw" has no suitable copy constructor
default_program(16): error: class "__half_raw" has no suitable copy constructor
default_program(16): error: class "__half_raw" has no suitable copy constructor
default_program(20): error: class "__half_raw" has no suitable copy constructor
default_program(20): error: class "__half_raw" has no suitable copy constructor
6 errors detected in the compilation of "default_program".