More slower use TVM than MXNet when I use batch forward


#1

I run the imagenet classify model (resnet34) in gpu mode,

In tvm
the shape (1 x 3 x 224 x 224) cost 2.9ms each times , more faster then mxnet (3.5ms)

but the same mode, cost 17ms in shape (10 x 3 x 224 x 224) in TVM
and the mxnet cost 14ms in shape (10 x 3 x 224 x 224).

Is reasonable ?


Multiple images
#2

I’m facing similar situation with tvm vs tf for batch forward. After auto-tuning of inception v3 for 1000 trials, the speed of tvm for batch_size=8 is around 36ms vs the speed of tf 39ms, with warning messages of can not find config for cuda …
I’m trying to enlarge the trials of tuning to see if the warning messages still exists.


#3

well, if you’re using cudnn, it’s actually a pretty strong baseline. Currently TVM schedules are mostly optimized for batch-size 1, targeting low-latency inference.
AutoTVM could help though.


#4

There seems to be two main faults of tvm for now if one wanna apply tvm to industry online gpu server:
1、dynamic batch_size, this is a key feature as one can use a dynamic number of images to feed in with a beforehand pipeline for preprocess of the requests, so the server can benefit from both batch_size>1 acceleration of inference and low latency when the numbers of inputs are dynamically decided according to the requests and network at the moment.
2、batch acceleration, if “Currently TVM schedules are mostly optimized for batch-size 1, targeting low-latency inference” as u said, the batch acceleration for tvm may need upgrade as well, as there are so many tools for batch inference, and the overall performance will beat tvm of batch_size=1.
Many thanks.


#5

I agree. For the first point, I believe @haichen @zhiics @jroesch @wweic are working on the dynamic batching at the moment.


#6

@Lee How you achieved batching in TVM? By running ‘module.run()’ N times?


#7

no. by making input ndarray’s first dimension = batch size


#8

@yzhliu I think its not true. I’m compiling mv2 using TVM. During compiling I’ve given following shapes (1,3,224,224)
But during runtime I’m trying to give 17 images as (17,3,224,224), but it is giving error as :
ValueError: array shape do not match the shape of NDArray (17, 3, 224, 224) vs (1, 3, 224, 224)

So I think for 17 images I’ve to run module 17 times. Is it true?


#9

actually you need to compile with shape (17, 3, 224, 224). dynamic shaping is not supported at the moment.