FP Precision : Batchnorm

I have observed a precision mismatch between tensorflow CPU output and TVM with batchnorm ops for certain distribution of input (significantly variance comparison).

Observed significant Tensorflow compile options are ‘-march=native’ and -msse3.

Any advice there? How do we enable these for our LLVM target ?

Thanks, Siva