The current auto tuning result is very good on our Android device with only 1 problem, as TVM has used all CPU on the device to do model inference and our application need to run continuously, CPU become too hot in the long run and performance decrease in that case.
So I want to ask whether I can limit TVM to only use 1-2 CPU cores on Android device? and should I do this during auto-tuning and re-tune(so that the tuned model will only use 1-2 thread)? or can I do this using the original tuned model, but change some setting on android application before launch the application would be OK?
one solution I’ve seen is to set
"runtime.config_threadpool", this seem suggest re-tune
The other is to set
TVM_NUM_THREADS, which seems re-tune is not required?