Tensorize in multithreaded environment


I have a small test case where I do a 2-d matrix addition.
A = tvm.placeholder((4,16), name=‘input1’)
B = tvm.placeholder((4,16), name=‘input2’)
C = tvm.compute((4,16),
lambda m,n: A[m,n] + B[m,n],

I tile on the leading dimension by 2 and parallelize as follows:

m,n = s[C].op.axis

m_outer,m_inner = s[C].split(m,factor=2)

I then match the inner matrix add with a tensorize routine:
And I add my own c function that does the matrix add as a packed func.
I print the address of the pointer passed into the routine. When there is a single thread(TVM_NUM_THREADS=1) the pointers are updated correctly prior to being passed to the function
But when there are 2 threads are more, the input pointer is not offset for threads other than the first one.


Please look here for the sample code


If we need the input of tensorization to have non-zero offset, we might need to redefine the buffer abstraction(by having buffer_alignment)


Hi Tianqi,

Thanks for the response, is it possible to pass in the loop index variables as arguments to the tensorized function?


So in the output of tvm.lower after tensorize:

After tensorize
produce output {
parallel (m.outer, 0, 2) {
tvm_call_packed(“test_intrin”, tvm_address_of(input1[(m.outer*32)]), tvm_address_of(input2[(m.outer32)]), tvm_address_of(output[(m.outer32)]))

We see that the tvm_address_of is pointing to the right offset in the buffer. Can the implementation of tvm_address_of be modifed to yield the right offset, taking into account parallelized loops?


OK, this might have something to do with a bug in TVM code generation when supporting packed function calling in the parallel body, Can you try swap call_packed by call_extern for now, and directly provide an extern “C” function with the same signature? This will get around this



Can you give an example of how to use call_extern please? I’m passing the output in a s a pointer to the external function so I have a void return type, is this supported?


You can simply use int as return type and not use the values for void functions, and it should work


see https://github.com/dmlc/tvm/blob/master/python/tvm/intrin.py#L134


Some qns:
Do you use TVM_REGISTER_GLOBAL to register the extern function?
How do you export the extern function ,by loading a .so?
do you need get_global_func?
are the arguments to the function C++ datatypes or TVMArgValues as for packed func?


For normal extern function, you don’t need to register them, just make sure they are exported as extern C in the runtime is fine. You cannot get them through tvm.get_global as they are typed c functions


I get the following error:

LLVM ERROR: Program used external function ‘test_intrin’ which could not be resolved!

I declare the function as extern “C” in the .so file.

f = tvm.module.load("./test.so")

def intrin_func(ins, outs):
body = tvm.call_extern (“int32”,“test_intrin”,ins[0].access_ptr(“rw”),ins[1].access_ptr(“rw”),outs[0].access_ptr(“rw”))

and here is my function

extern “C” int test_intrin(void * input, void * input2, void * output){
printf(“input %.8X\n”, input);




Thank you for the response. The issue I’m facing is that I want to use call_extern to invoke a function in a dynamic .so library which I link by using tvm.module.load.

In your approach you implement the extern function as llvm inline assembly, but for my case the function is implemented in the dynamic .so library and I want to call that… Is there a way to do this?



You can, as long as the .so library is loaded prior to the tvm.module.load in the tvm runtime



The .so library is an external library which I have compiled using a c++ compiler. Now I load it using tvm.module.load to expose it to tvm runtime. But the external function implemented in the library is not resolved as indicated by the error I reported earlier.



Please check out the example here https://github.com/dmlc/tvm/pull/2156, as I said in my last post, it is fine as long as .so library is loaded in the tvm runtime.

However, we do need to load the so as RTLD_GLOBAL object, so it is visible to other so library. Or you can re-expose it in the tvm runtime and have tvm runtime link against your dll


Thanks, this looks like what I need, I’ll confirm if it worked for me.