TVM's get_output function is time-consuming with Mali openCL on RK3399

@merrymercy
Today i got an issue of very bad performance for mobilenet_ssd_300 model (no priorbox and detection’s layer). it’s about 1.9s for “run” function. I could not get 230ms as i reported before and i don’t know why
Otherwise, i tried to reproduce your benchmark result and it’s my result:

  • mobilenet: ~80ms
  • resnet18: 0.19822525763333332 s
  • vgg16: 0.98832358445 s

The resnet18 and vgg16 model’s results is a big different with result here
Here is my script. I don’t use RPC.
Is there something wrong with my implementation?

Can you check if you have the current configs for mali from tophub?

@eqy i checkout the lastest code of tvm and i can reproduce mobilenet_ssd_300’s performance as i reported before (250ms)
i found that we don’t have pre-tunned parameters for this model. So the performance will be better if i use autoTVM?

sure.      

Yes, there are no guarantees about using fallback configs. The performance will very likely be better with AutoTVM.

@merrymercy Is this the case with CPU run as well?

run on CPU should cost “real” running time.

@titikid

I have RK3399 board and like to reproduce your result of mobilenet-ssd.
Can you show me the steps of doing this and the best time you have got?

Thanks,

This is also urgent to me:grinning::grinning::grinning:

Can we reduce time cost by asynchronous getting results back from GPU? Not to wait for all outputs to be ready?