Softmax layer not working on Raspberry pi 4

elenkalda-arm · October 1, 2019, 4:31pm

Hi everyone,

I am trying to run the tutorial Deploy the Pretrained Model on Raspberry Pi on Raspberry Pi 4 that runs Raspbian GNU/Linux 10 and I am running into an weird problem - instead of classifying that picture of a kitten as a cat, it amusingly classifies it as a tench. Digging deeper, it appears that all the entries of the output vector are NaN-s and the layer that produces those NaN-s is the softmax layer. Has anyone else had similar problems?

thierry · October 1, 2019, 6:56pm

That’s odd, have you been able to run the same tutorial on the Raspberry pi 3 without issues?

elenkalda-arm · October 2, 2019, 5:09pm

We tried running it today on Raspberry pi 3 and the same thing happened (we used the beginning of that tutorial to build the runtime on the pi). However, there are no issues with running nets on Hikey960 which has 64-bit OS, so it is something specific to a 32-bit runtime.

We checked the behaviour of some of the Relay operators and some of them fail in an odd way:
(1) softmax always produces NaN-s (log_softmax works fine though)
(2) exp and log output whatever was given as an input
(3) sigmoid always returns ones
I can’t see any obvious pattern in these failures… What is the best way of debugging a runtime?

mbaret · October 7, 2019, 4:26pm

This was an issue relating to using LLVM 4.0. The mfloat-abi=hard option is not working for some reason on LLVM versions < 6.0. PR https://github.com/dmlc/tvm/pull/4071 has added the appropriate error if using LLVM < 6.0.