I have a c++ application and it will run multiple tvm instances in parallel, and I wish each tvm instance can use 4 cpus to optimize the execution time. (e.g: on a machine with 20 cores, I will run 5 tvm instances and each instances will use and only use their assigned 4 cores).
If I don’t need them to run in parallel, I can use environment variable e.g:
export TVM_NUM_THREADS=4 to set the cpu usage for entire application, but I don’t know what would be the best practice to set such limitation for every instances (who may run in parallel) inside the application.
Any insights? Thanks in advance!