Use NNVM to parallelize a model or data across cores?


Is there documentation on how to set the hardware target to be multi-core (during a model build/compile)? I’ve been trying to figure out how to split a model inference across multiple threads, but this is hard to elucidate from the tutorials.

Any pointers appreciated, thanks.


What do you mean by “split a model inference across multiple threads”? If you just want to parallelize convolution or other operations on multi core cpu, you can use “llvm” or “llvm -mcpu=core-avx2”.


How do you specify the number of threads? Is it possible to run data parallel, instead of parallelizing the kernels themselves?


you can set the env variable TVM_NUM_THREADS. Yes, for example you can set TVM_NUM_THREADS to be the half of your core count, and run inference on two samples simultaneously. But you need to add parallel_for on the outer loop yourself.


Thanks for the quick reply!

Would you know where I could find TVM_NUM_THREADS in the documentation or parallel_for?


It is not documented :slight_smile:
for TVM_NUM_THREADS you can find it here

By parallel_for I mean you need to use openmp’s #pragma parallel for or tbb’s parallel_for.