Hi. I compared the CPU load while running TF and TVM on a ResNet-like model. For TVM run, the model was exported from TensorFlow and compiled with all default settings. Below are htop output for both cases. One can see that while TF utilizes all CPUs, TVM utilizes only half of them. What options do affect this behavior?
Thanks! At first glance, If the idea was to use only real cores, then probably dividing by 2 is not enough. Of cause, one have to learn Linux kernel enumeration scheme to tell for sure.
Here is the updated TVM results, after setting TVM_NUM_THREADS=40
TVM results became less stable this way (see rising std below). Probably, some synchronisation issues? Here are some results, time is per iteration, in seconds:
TVM_NUM_THREADS is undefined:
tf running time : 0.079108742531389 +- 0.002206700763389244
tvm running time : 0.09196180960163475 +- 0.015231218257996748
TVM_NUM_THREADS=40
tf running time : 0.07508971206843854 +- 0.002301754041790758
tvm running time : 0.10551985777914524 +- 0.05513929844949404