Precision error in Conv2D operator

By running some tests, I noticed that there might be a precision error with the Conv2D operator when using float32 type.

My goal is to implement the ONNX MNIST model, available on ONNX zoo, from scratch by using operators offered by TVM. But in the end, results were not satisfying (11% accuracy…).

So I decided to debug each layer of the model. By debugging the first layer, the Conv2D, taking the image as input and weights as parameters, I noticed that the resulting matrix differs with the ONNX one.

This gist shows the way I tested the operator.

If you look at the resulting subtract matrix, the results differ by 1e-07 at maximum, so I think that this comes from a precision problem. The thing is that such a difference changes the network’s output by a lot…

Does anyone know how can I deal with this sort of issue?

This seems like a very small error if I am reading the value right, which could be due to floating point operation re-ordering that is the result of a given schedule optimization.

In terms of the 11% accuracy, we would need a much higher magnitude of error. I suspect that the issue comes from somewhere else perhaps?