Minibatch support for GPU inference

Hi,

We are interns in eBay and our team is interested in applying NNVM/TVM stack to our platform. We did some benchmark test on GPUs and realized that bach_size > 1 can improve our throughput significantly. The following is our benchmark test on ResNet50 on P100:

batch_size throughput
1 95.9
2 180.5
4 324.3
8 451.5

We just wonder do you have any plans to support batch inference? If not, what is the current technical challenge for it?

Thanks a lot!

I am hoping that the upcomming autotvm integration for cuda backend will remove the batch 1 limitation. Winograd convolution is coming as well. For winograd, supporting batch > 1 is trivial and the larger the batch size , the better.

@merrymercy Can you comment?

The first pr on autotvm will only support batch size = 1.

We are working on the schedule for batch inference. Hopefully we can support it soon.

@merrymercy

Any progress on the batch size > 1?

Now the templates in master branch support arbitrary batch size.
In terms of performance, I think it is good as least for batch size < 4.

2 Likes