Use NNVM to parallelize a model or data across cores?

skoppula · November 7, 2018, 7:18pm

Is there documentation on how to set the hardware target to be multi-core (during a model build/compile)? I’ve been trying to figure out how to split a model inference across multiple threads, but this is hard to elucidate from the tutorials.

Any pointers appreciated, thanks.

masahi · November 8, 2018, 1:42am

What do you mean by “split a model inference across multiple threads”? If you just want to parallelize convolution or other operations on multi core cpu, you can use “llvm” or “llvm -mcpu=core-avx2”.

skoppula · November 8, 2018, 10:48am

How do you specify the number of threads? Is it possible to run data parallel, instead of parallelizing the kernels themselves?

masahi · November 8, 2018, 1:23pm

you can set the env variable TVM_NUM_THREADS. Yes, for example you can set TVM_NUM_THREADS to be the half of your core count, and run inference on two samples simultaneously. But you need to add parallel_for on the outer loop yourself.

skoppula · November 8, 2018, 2:03pm

Thanks for the quick reply!

Would you know where I could find TVM_NUM_THREADS in the documentation or parallel_for?

masahi · November 8, 2018, 2:47pm

It is not documented
for TVM_NUM_THREADS you can find it here

By parallel_for I mean you need to use openmp’s #pragma parallel for or tbb’s parallel_for.

aca88 · April 2, 2019, 8:00pm

Hi,

I’m wondering if I could get more info on this.
So TVM runtime can run threads? I thought this was only for autotunning

masahi · April 2, 2019, 10:04pm

The runtime has multithreading for CPU inference.

aca88 · April 3, 2019, 6:57am

Thanks for answering.

So if I wanted to expand the VTA example by having 2 VTA accelerators, this multithreading could not be used to offload the inference to both VTA accelerators?

masahi · April 3, 2019, 11:03am

sorry, I’m not familiar with VTA.