C++ API test running on NVIDIA GPU with different betch_size and different repeat, run time increased

tracy1 · March 23, 2019, 8:10am

I wanted to test the run time of a model with different betch_size using C++ API.
First，converted a caffe model（vgg16） to mxnet，and compiled.
Then，changed the batch_size and repeat count with different size.
Finally，I found when betch_size = 64， the average time increased with the repeat count.
Also, free GPU DLTensor consumed more time.
Anyone can help me? Thanks！
Part of code:
tvm::runtime::PackedFunc run = mod.GetFunction(“run”);
gettimeofday(&start,NULL);
for(int i = 0; i < repeat; ++i)
{
run();
}
gettimeofday(&end,NULL);

Result:(ms)
betch_size=1 repeat=1 runtime=1 freetime(input_gpu)=4
betch_size=1 repeat=10 runtime=2 freetime(input_gpu)=43
betch_size=1 repeat=100 runtime=349 freetime(input_gpu)=78
betch_size=1 repeat=1000 runtime=4098 freetime(input_gpu)=79

betch_size=64 repeat=1 runtime=1 freetime(input_gpu)=364
betch_size=64 repeat=10 runtime=1 freetime(input_gpu)=3506
betch_size=64 repeat=100 runtime=22952 freetime(input_gpu)=11889
betch_size=64 repeat=1000 runtime=340881 freetime(input_gpu)=12023