Use NNVM to parallelize a model or data across cores?

Is there documentation on how to set the hardware target to be multi-core (during a model build/compile)? I’ve been trying to figure out how to split a model inference across multiple threads, but this is hard to elucidate from the tutorials.

Any pointers appreciated, thanks.

1 Like

What do you mean by “split a model inference across multiple threads”? If you just want to parallelize convolution or other operations on multi core cpu, you can use “llvm” or “llvm -mcpu=core-avx2”.

How do you specify the number of threads? Is it possible to run data parallel, instead of parallelizing the kernels themselves?

you can set the env variable TVM_NUM_THREADS. Yes, for example you can set TVM_NUM_THREADS to be the half of your core count, and run inference on two samples simultaneously. But you need to add parallel_for on the outer loop yourself.

1 Like

Thanks for the quick reply!

Would you know where I could find TVM_NUM_THREADS in the documentation or parallel_for?

It is not documented :slight_smile:
for TVM_NUM_THREADS you can find it here

By parallel_for I mean you need to use openmp’s #pragma parallel for or tbb’s parallel_for.

Hi,

I’m wondering if I could get more info on this.
So TVM runtime can run threads? I thought this was only for autotunning

The runtime has multithreading for CPU inference.

Thanks for answering.

So if I wanted to expand the VTA example by having 2 VTA accelerators, this multithreading could not be used to offload the inference to both VTA accelerators?

sorry, I’m not familiar with VTA.