SSD GluonCV: Incorrect inference result on OpenCL ARM Mali

I tried to compile GluonCV SSD model ssd_512_mobilenet1.0_voc for llvm x86 and for tvm.target.mali('rk3399').

Both compilations worked fine but inference results are different.
I used image from GluonCV tutorial - street_small.jpg preprocessed using gluoncv.data.transforms.presets.ssd.load_test(short=512)

### llvm x86
# The result is the same as in original GluonCV SSD model
# 9 objects with score >= 50% - 1 bicycle, 2 car, 6 person
# score  class_id label bbox
0.99883384 1 bicycle [311.92276 280.6423  480.24875 395.4282 ]
0.9818761 6 car [256.52148 226.02039 375.46075 297.76758]
0.93540573 14 person [ 17.623653 212.96735   95.69469  362.3305  ]
0.8603827 14 person [349.0219  198.23361 445.78577 399.68332]
0.83436245 14 person [130.7037  220.82726 145.0582  268.93137]
0.7953389 6 car [212.75687 212.65344 284.37183 290.7951 ]
0.7874611 14 person [145.45284 223.99316 158.44994 267.89554]
0.712249 14 person [184.73701 207.67241 242.56184 327.2542 ]
0.6019558 14 person [170.67674 217.86595 199.24776 299.4095 ]
### tvm.target.mali('rk3399')
# 100 objects with score >= 50% - 20 bicycle, 2 car, 5 person, 73 aeroplane
# lots of objects with score > 100%
# score  class_id label bbox
1.5823054 0 aeroplane [1.0612862  0.6428509  0.33811423 0.        ]
1.5558598 0 aeroplane [0.7675109 0.5182781 0.2734055 0.       ]
1.545749 1 bicycle [1.2887409  1.0955629  0.89454305 0.7174907 ]
1.5409416 1 bicycle [1.107888  1.1792644 1.4139477 1.3293245]
1.4165628 0 aeroplane [2.0808587 2.0669181 1.6659384 1.3989905]
1.340633 1 bicycle [0.9456403 0.9181117 1.2002937 1.0714405]
1.3251178 1 bicycle [1.1574025  1.23634    1.0462368  0.73867166]
1.2862756 1 bicycle [0.         0.         0.         0.20761052]
1.2726729 1 bicycle [0.9707655  0.70205534 0.6896615  0.49442405]
1.2660424 1 bicycle [0.8439723 0.9891814 1.1026793 1.1307108]
1.2225115 0 aeroplane [0.8461864  0.12424816 0.         0.        ]
1.190823 0 aeroplane [1.4713199  1.4375634  0.46537843 0.        ]
1.1863184 0 aeroplane [1.1920383  0.5074234  0.2274646  0.08333829]
1.1355017 1 bicycle [0.96647227 0.35146737 0.21094643 0.26085588]
1.1335651 0 aeroplane [0.54785633 0.41987288 0.26876858 0.0058417 ]
1.1303611 1 bicycle [0.9115986  1.5116146  0.83921695 0.62559855]
1.1276394 0 aeroplane [0.8510597  0.33030063 0.8345822  0.3915038 ]
1.1084547 0 aeroplane [1.1133226  0.9229205  0.5499488  0.30741337]
1.1072786 0 aeroplane [1.0158645  0.21140838 0.         0.15299392]
1.1025088 1 bicycle [0.4311658  0.4416018  0.21874952 0.24016196]
1.0849174 0 aeroplane [0.87456214 0.7318405  0.75175357 0.90420747]
1.036535 0 aeroplane [1.4980737 1.5669382 1.5268793 1.1016318]
1.0251759 0 aeroplane [0.7962408 1.2129316 1.2494624 0.9876696]
1.0186558 0 aeroplane [0.9292245 0.865847  0.8836466 1.3513273]
1.0065148 0 aeroplane [0.4001711 0.0413961 0.        0.       ]
1.0005158 0 aeroplane [0.68182206 0.9958501  0.52712834 0.24823435]
0.99883384 1 bicycle [311.92273 280.6423  483.44757 366.8985 ]
0.9978578 0 aeroplane [0.8149737  0.47029173 0.49533165 0.32302207]
0.9929792 0 aeroplane [0.7763098  0.15265542 0.         0.        ]
0.9818761 6 car [256.52148 226.02036 339.73038 323.5351 ]
0.9815484 0 aeroplane [0.88672113 0.4353604  0.27727872 0.4599397 ]
0.9707349 1 bicycle [0.53209877 0.9191654  0.9622632  0.6007136 ]
0.9457397 0 aeroplane [0.44022727 0.         0.         0.        ]
0.93640554 0 aeroplane [1.1516927  1.2499415  0.97149754 0.42520648]
0.9354051 14 person [ 17.62365 212.96735 164.82767 327.81332]
0.93506193 0 aeroplane [0.50178397 0.2142425  0.         0.        ]
0.92934954 0 aeroplane [0.87120676 0.43420005 0.4443583  0.7075026 ]
0.92929757 0 aeroplane [0.9707571  0.330072   0.51393443 0.34264034]
0.90576226 0 aeroplane [0.70162296 0.06581405 0.         0.        ]
0.90160996 0 aeroplane [0.69183666 0.42637348 0.22179589 0.        ]
0.898031 0 aeroplane [0.71883494 0.510046   0.6338638  1.0645595 ]
0.8898927 0 aeroplane [0. 0. 0. 0.]
0.87768936 0 aeroplane [0.55049384 0.12136862 0.03813171 0.        ]
0.8763647 0 aeroplane [0.50491345 0.2015484  0.14727682 0.6608703 ]
0.87509793 0 aeroplane [0.59303546 0.26502514 0.5189958  0.04242197]
0.8734181 0 aeroplane [0.8476893  0.5869044  0.47991008 0.40104437]
0.8603797 14 person [349.0219  198.23355 512.15173 324.12656]
0.8516257 1 bicycle [1.3946235  1.0379905  0.904202   0.78918004]
0.84934616 1 bicycle [0.47831392 0.         0.23665243 0.63362116]
0.8466381 1 bicycle [0.77948475 0.78839695 0.6248063  0.46815783]
0.83639205 0 aeroplane [0.9563688  1.1468122  1.1124576  0.53170276]
0.83436203 14 person [130.7037  220.82726 145.0582  268.93137]
0.8226299 0 aeroplane [1.2033314  1.0960646  0.75911355 0.2228817 ]
0.8204712 0 aeroplane [0.41749817 0.16716182 0.0720408  0.02681321]
0.81481385 0 aeroplane [1.2920474 1.393214  1.2269064 0.8116064]
0.7953339 6 car [212.75685 212.65344 284.37183 290.7951 ]
0.78746045 14 person [145.45285 223.99316 158.44992 267.89554]
0.78290105 0 aeroplane [0.84571123 0.4586707  0.17901507 0.        ]
0.7721739 0 aeroplane [0.21182677 0.20125294 0.         0.        ]
0.76801527 0 aeroplane [0.34975743 0.6874822  1.1797105  1.1632812 ]
0.7662051 0 aeroplane [0.77315474 0.84332466 0.9398644  0.9426873 ]
0.7590816 0 aeroplane [0.9537035  0.42177492 0.2114219  0.09264582]
0.75160646 0 aeroplane [0.58506113 0.588799   0.22396028 0.20704624]
0.75092673 0 aeroplane [0.16874933 0.         0.         0.        ]
0.74778306 1 bicycle [0.        0.        0.0788736 0.8791949]
0.74349463 0 aeroplane [0.67440987 0.26185703 0.11416936 0.4658394 ]
0.73532224 0 aeroplane [0.75051963 1.0833898  1.0806832  1.1322415 ]
0.7333882 0 aeroplane [0.71443284 0.600847   0.2782367  0.26314145]
0.72079766 0 aeroplane [0.7363094  0.57217395 0.         0.        ]
0.7133771 0 aeroplane [0.9515971  1.2044042  0.63017833 0.26459414]
0.7122461 14 person [184.737   207.67242 292.65894 294.5574 ]
0.70781684 0 aeroplane [1.0500189 1.282826  1.263995  1.2673242]
0.706195 0 aeroplane [0.6051427  0.43965822 0.57344115 0.7641499 ]
0.7034395 0 aeroplane [0.15905681 0.26184222 0.12347919 0.32377112]
0.6984519 0 aeroplane [0.9621873  0.86146784 0.35820213 0.        ]
0.69532025 1 bicycle [305.9198  275.57004 443.07428 457.44806]
0.6938748 0 aeroplane [0.70804965 0.6665212  0.44574636 0.52361226]
0.69147974 0 aeroplane [0.15398383 0.09417512 0.28831327 1.0575197 ]
0.68810093 0 aeroplane [0.8449187  0.61322176 0.06345853 0.        ]
0.68150485 1 bicycle [0.46998554 0.22141188 0.11616701 0.0257147 ]
0.669511 0 aeroplane [0.8097224 0.9455615 0.5052085 0.2243784]
0.6639271 0 aeroplane [0.6898171  0.4062506  0.02646595 0.        ]
0.6619834 0 aeroplane [0.41782573 0.         0.         0.        ]
0.6546775 0 aeroplane [0.62482363 0.53851414 0.05290025 0.29209018]
0.654259 1 bicycle [0.31463876 0.         0.15885617 0.5382549 ]
0.65274 0 aeroplane [0.59100676 0.16067621 0.         0.        ]
0.6479243 0 aeroplane [0.84906495 1.4042575  1.094893   0.6851456 ]
0.6442888 0 aeroplane [0.8200085 0.8635886 0.8926958 0.6045233]
0.64018744 0 aeroplane [0.8244102  0.8539413  0.16885157 0.05142185]
0.6392654 0 aeroplane [0.16514444 0.25101793 0.13594401 0.32062298]
0.63499784 0 aeroplane [0.23711498 0.         0.         0.        ]
0.62918806 0 aeroplane [0.20165402 0.         0.         0.35375488]
0.6279786 0 aeroplane [0.79454845 1.2066615  1.0915481  0.95324254]
0.6236094 0 aeroplane [1.2512031  1.0596199  0.41794193 0.99301   ]
0.61735773 0 aeroplane [0.20427014 0.14911151 0.         0.        ]
0.61197674 0 aeroplane [0.         0.         0.43294704 0.2212234 ]
0.6113349 0 aeroplane [0.39956802 0.34553128 0.09166968 0.        ]
0.60923463 0 aeroplane [0.3598666  0.21451873 0.4378687  0.16640624]
0.608783 0 aeroplane [0.83296394 0.70367634 0.89337784 1.6359301 ]
0.60432965 1 bicycle [0.17938468 0.0156461  0.45149958 0.5475396 ]

@tqchen @Laurawly Can you have a look?

What about execution time? Can you show performance results, please?

@Laurawly and I tested the model using heterogeneous execution, we fell the get_valid_counts op to CPU but kept all the other ones on mali gpu. It turned out that the results are correct. There might be some bugs in the implementation of get_valid_counts op on mali gpu.

To get correct result on Mali GPU we need to recompile the model and send vision.get_valid_counts operator to arm_cpu. It is needed because this operator does not work correctly on Mali GPU.

Code example (Thank you @zhiics) https://gist.github.com/apivovarov/7c46dc82ce01d8ed639a46e9ee94e5c5

Also you need to use the latest TVM with PR-3311

To solve first run 15 min hang issue you also need PR-3268 (Thank you @Laurawly)

To run compiled model on edge device we need to specify both cpu and opencl in the context

ctx = [tvm.cpu(0), tvm.opencl(0)]
m = graph_runtime.create(graph, lib, ctx)