Ssd_inceptionv3_512 wrong result on Intel Atom with opencl -device=intel_graphics

I tried to compile and run mxnet-ssd ssd_inceptionv3_512 model on intel graphics (Deeplens Atom E3930). I used Feb 7th TVM version and Mar 7th version with PR 2747

It compiles without any errors

It gives correct result for target: opencl, target_host: llvm, but it is very slow (16,851 ms on Atom). Result:

time: 16851 ms
1 car [6.         0.97556907 0.599846   0.1386168  0.9068787  0.29633522]
2 bicycle [1.         0.7629443  0.18050349 0.2417408  0.75496024 0.76243854]

But it gives incorrect inference result on Intel Atom if I set target to target: opencl -device=intel_graphics -model=unknown, target_host: llvm Result:

time: 1987 ms
1 pottedplant [15.          0.7816069   0.          0.07468978  1.          0.962857  ]

Correct result should be car, dog and bicycle.

I prepared code which helps to reproduce incorrect output

clone my mxnet-ssd-tvm repo

clone git@github.com:apivovarov/mxnet-ssd-tvm.git

# combine deploy_ssd_inceptionv3_512-0000.params parts
cd deploy_ssd_inceptionv3_512
cat deploy_ssd_inceptionv3_512-0000.params.a* > deploy_ssd_inceptionv3_512-0000.params

Compile ssd_inceptionv3_512 model on main computer (having tvm + nnvm + topi)

python3 compile.py

# it will generate 3 files for intel graphics device.
model.so
model.json
model.params

copy files to device with Atom Intel Graphics and run the model

python3 run-ssd.py

# it will print
time: 1999 ms
1 chair [8.         0.5014031  0.         0.067276   1.         0.93533254]
# or 
time: 1987 ms
1 pottedplant [15.          0.7816069   0.          0.07468978  1.          0.962857  ]
# or
time: 2009 ms
1 pottedplant [15.          0.6881262   0.          0.03234622  0.9601488   0.97136605]
2 bird [2.         0.55163026 0.41290188 0.53170156 0.47892678 0.65033484]

I tried to run inference on Intel i7 i7-8700 using the same compiled model files and I got CORRECT result. So, the issue can be reproduced only on Intel Atom cpu/gpu I used Intel® Atom™ Processor E3930 @ 1.30GHz

@apivovarov I got error with opencl remote rpc, however arm-cpu is correct.