Hi
I am working on torchvision models which are compiled to cuda and llvm(host) using relay.build()
in python.
I am trying to attach 3 additional arguments to the void_args
block of memory that is passed to cuLaunchKernel
in tvm/runtime/cuda_module.cc
for each of the kernels.
I have already modified the kernels by compiling them using clang++ with a custom pass from within tvm_callback_cuda_compile
, so the kernels have 3 additional arguments + small block of code added to them by the time the relay.build()
stage finishes.
I’ve figured the next step would be to pass the values of the aruments to these kernels. I’ve added some code that handles that in GraphRuntime::CreateTVMOp
, which in essence push_back’s the values onto arg_ptr->arg_values
. What I didn’t account for is the fact that these kernel launches are first called from within “llvm” host code which configures the launch and passes arguments.
Whenever I try to run the GraphRuntime
ops, I get an assertion error: TVMError: Check failed: ret == 0 (-1 vs. 0) : Assert fail: (num_args == 4), fused_nn_conv2d_add_nn_relu_3: num_args should be 4
, which would make sense as the number of args passed to llvm hostcode for each op is checked in the llvm host code.
My question is: Is there a way to do this without massively modifying the llvm host code generation phase in llvm_module.cc
etc.? Ideally, I’d like to pass some arguments to GraphRuntime
in C++ from within Python and be able to add them to the kernel launches.
I cannot simply append new arguments to void_args
as I don’t know their size for each kernel to rebuild them so that’s kind of out the question.
Any ideas would be greatly appreciated!