TVM's get_output function is time-consuming with Mali openCL on RK3399

titikid · October 11, 2018, 11:28am

@merrymercy
Today i got an issue of very bad performance for mobilenet_ssd_300 model (no priorbox and detection’s layer). it’s about 1.9s for “run” function. I could not get 230ms as i reported before and i don’t know why
Otherwise, i tried to reproduce your benchmark result and it’s my result:

mobilenet: ~80ms
resnet18: 0.19822525763333332 s
vgg16: 0.98832358445 s

The resnet18 and vgg16 model’s results is a big different with result here
Here is my script. I don’t use RPC.
Is there something wrong with my implementation?

eqy · October 12, 2018, 8:40pm

Can you check if you have the current configs for mali from tophub?

titikid · October 15, 2018, 9:03am

@eqy i checkout the lastest code of tvm and i can reproduce mobilenet_ssd_300’s performance as i reported before (250ms)
i found that we don’t have pre-tunned parameters for this model. So the performance will be better if i use autoTVM?

merrymercy · October 15, 2018, 4:31pm

sure.

eqy · October 15, 2018, 6:30pm

Yes, there are no guarantees about using fallback configs. The performance will very likely be better with AutoTVM.

kgomaa · November 4, 2018, 8:25am

@merrymercy Is this the case with CPU run as well?

yzhliu · November 4, 2018, 6:17pm

run on CPU should cost “real” running time.

kaishi · December 1, 2018, 6:47pm

@titikid

I have RK3399 board and like to reproduce your result of mobilenet-ssd.
Can you show me the steps of doing this and the best time you have got?

Thanks,

daweili1226 · December 20, 2018, 3:18am

This is also urgent to me:grinning:

daweili1226 · December 20, 2018, 3:59am

Can we reduce time cost by asynchronous getting results back from GPU? Not to wait for all outputs to be ready?