Hi,
We are interns in eBay and our team is interested in applying NNVM/TVM stack to our platform. We did some benchmark test on GPUs and realized that bach_size > 1 can improve our throughput significantly. The following is our benchmark test on ResNet50 on P100:
batch_size throughput
1 95.9
2 180.5
4 324.3
8 451.5
We just wonder do you have any plans to support batch inference? If not, what is the current technical challenge for it?
Thanks a lot!