I don’t think there is a plan to add a GPU schedule for matmul. But there is the batched_matmul schedule for GPU. You can use that from our onnx frontend.
Thanks @masahi , could you guide me towards writing the gpu schedule for matmul as it might be there for cpu ?
Hi @masahi @srkreddy1238 @FrozenGene @tqchen
facing same error mentioned above.
what line did you change so that it worked out for
fuse_reshape_broadcast_mul_conv2d_broadcast_mul_broadcast_add_elemwise_add.
Did you forget to bind?
I’m getting the below error:
nnvm._base.NNVMError: ValueError: Direct host side access to device memory is detected in fuse_matmul_relu. Did you forget to bind?
Did not understand how to use batched_matmul for the same.
Any help here?
Hi @masahi @Hkathuria @yqwang I am facing this same issue for metal GPU.
Error: ValueError: Direct host side access to device memory is detected in addone. Did you forget to bind?
python script is:
import tvm
import osdef prepare_test_libs(base_path):
n = tvm.var(“n”)
A = tvm.placeholder((n,), name=‘A’)
B = tvm.compute(A.shape, lambda *i: A(*i) + 1.0, name=‘B’)
s = tvm.create_schedule(B.op)Compile library as dynamic library
fadd_dylib = tvm.build(s, [A, B], "metal", name="addone") dylib_path = os.path.join(base_path, "test_addone_dll.dylib") fadd_dylib.export_library(dylib_path)
if name == “main”:
curr_path = os.path.dirname(os.path.abspath(os.path.expanduser(__file__))) prepare_test_libs(os.path.join(curr_path, "./lib"))
I am building it for metal device.
you need to add proper schedule such as binding axis to threads
see https://docs.tvm.ai/tutorials/tensor_expr_get_started.html#schedule-the-computation
something like
s[C].bind(bx, tvm.thread_axis(“blockIdx.x”))
s[C].bind(tx, tvm.thread_axis(“threadIdx.x”))