I am working on torchvision models which are compiled to cuda and llvm(host) using
relay.build() in python.
I am trying to attach 3 additional arguments to the
void_args block of memory that is passed to
tvm/runtime/cuda_module.cc for each of the kernels.
I have already modified the kernels by compiling them using clang++ with a custom pass from within
tvm_callback_cuda_compile, so the kernels have 3 additional arguments + small block of code added to them by the time the
relay.build() stage finishes.
I’ve figured the next step would be to pass the values of the aruments to these kernels. I’ve added some code that handles that in
GraphRuntime::CreateTVMOp, which in essence push_back’s the values onto
arg_ptr->arg_values. What I didn’t account for is the fact that these kernel launches are first called from within “llvm” host code which configures the launch and passes arguments.
Whenever I try to run the
GraphRuntime ops, I get an assertion error:
TVMError: Check failed: ret == 0 (-1 vs. 0) : Assert fail: (num_args == 4), fused_nn_conv2d_add_nn_relu_3: num_args should be 4, which would make sense as the number of args passed to llvm hostcode for each op is checked in the llvm host code.
My question is: Is there a way to do this without massively modifying the llvm host code generation phase in
llvm_module.cc etc.? Ideally, I’d like to pass some arguments to
GraphRuntime in C++ from within Python and be able to add them to the kernel launches.
I cannot simply append new arguments to
void_args as I don’t know their size for each kernel to rebuild them so that’s kind of out the question.
Any ideas would be greatly appreciated!