Tensorize in multithreaded environment


#1

Hi,
I have a small test case where I do a 2-d matrix addition.
A = tvm.placeholder((4,16), name=‘input1’)
B = tvm.placeholder((4,16), name=‘input2’)
C = tvm.compute((4,16),
lambda m,n: A[m,n] + B[m,n],
name=‘output’)

I tile on the leading dimension by 2 and parallelize as follows:

m,n = s[C].op.axis

m_outer,m_inner = s[C].split(m,factor=2)
s[C].reorder(m_outer,m_inner,n)
s[func].parallel(m_outer)

I then match the inner matrix add with a tensorize routine:
And I add my own c function that does the matrix add as a packed func.
I print the address of the pointer passed into the routine. When there is a single thread(TVM_NUM_THREADS=1) the pointers are updated correctly prior to being passed to the function
But when there are 2 threads are more, the input pointer is not offset for threads other than the first one.


#2

Please look here for the sample code
https://github.com/dmlc/tvm/files/2295210/test.tar.gz


#3

If we need the input of tensorization to have non-zero offset, we might need to redefine the buffer abstraction(by having buffer_alignment)


#4

Hi Tianqi,

Thanks for the response, is it possible to pass in the loop index variables as arguments to the tensorized function?


#5

So in the output of tvm.lower after tensorize:

After tensorize
produce output {
parallel (m.outer, 0, 2) {
tvm_call_packed(“test_intrin”, tvm_address_of(input1[(m.outer*32)]), tvm_address_of(input2[(m.outer32)]), tvm_address_of(output[(m.outer32)]))
}
}

We see that the tvm_address_of is pointing to the right offset in the buffer. Can the implementation of tvm_address_of be modifed to yield the right offset, taking into account parallelized loops?


#6

OK, this might have something to do with a bug in TVM code generation when supporting packed function calling in the parallel body, Can you try swap call_packed by call_extern for now, and directly provide an extern “C” function with the same signature? This will get around this


#7

Hi,

Can you give an example of how to use call_extern please? I’m passing the output in a s a pointer to the external function so I have a void return type, is this supported?


#8

You can simply use int as return type and not use the values for void functions, and it should work


#9

see https://github.com/dmlc/tvm/blob/master/python/tvm/intrin.py#L134


#10

Some qns:
Do you use TVM_REGISTER_GLOBAL to register the extern function?
How do you export the extern function ,by loading a .so?
do you need get_global_func?
are the arguments to the function C++ datatypes or TVMArgValues as for packed func?


#11

For normal extern function, you don’t need to register them, just make sure they are exported as extern C in the runtime is fine. You cannot get them through tvm.get_global as they are typed c functions


#12

I get the following error:

LLVM ERROR: Program used external function ‘test_intrin’ which could not be resolved!

I declare the function as extern “C” in the .so file.

f = tvm.module.load("./test.so")

def intrin_func(ins, outs):
body = tvm.call_extern (“int32”,“test_intrin”,ins[0].access_ptr(“rw”),ins[1].access_ptr(“rw”),outs[0].access_ptr(“rw”))

and here is my function

extern “C” int test_intrin(void * input, void * input2, void * output){
printf(“input %.8X\n”, input);
};


#13

#14

Hi,

Thank you for the response. The issue I’m facing is that I want to use call_extern to invoke a function in a dynamic .so library which I link by using tvm.module.load.

In your approach you implement the extern function as llvm inline assembly, but for my case the function is implemented in the dynamic .so library and I want to call that… Is there a way to do this?

Thanks,
Anand


#15

You can, as long as the .so library is loaded prior to the tvm.module.load in the tvm runtime


#16

Hi,

The .so library is an external library which I have compiled using a c++ compiler. Now I load it using tvm.module.load to expose it to tvm runtime. But the external function implemented in the library is not resolved as indicated by the error I reported earlier.

Thanks,
Anand


#17

Please check out the example here https://github.com/dmlc/tvm/pull/2156, as I said in my last post, it is fine as long as .so library is loaded in the tvm runtime.

However, we do need to load the so as RTLD_GLOBAL object, so it is visible to other so library. Or you can re-expose it in the tvm runtime and have tvm runtime link against your dll


#18

Thanks, this looks like what I need, I’ll confirm if it worked for me.