Do we need to explicitly enable NEON for ARMv8 devices?

apivovarov · November 4, 2019, 8:48pm

All ARMv8-based devices support Neon.

Do we need to explicitly say -mattr=+neon for ARMv8 devices?

        "pixel2":    ["-model=snapdragon835", "-target=arm64-linux-android -mattr=+neon"],
        "mate10":    ["-model=kirin970", "-target=arm64-linux-android -mattr=+neon"],
        "mate10pro": ["-model=kirin970", "-target=arm64-linux-android -mattr=+neon"],
        "p20":       ["-model=kirin970", "-target=arm64-linux-android -mattr=+neon"],
        "p20pro":    ["-model=kirin970", "-target=arm64-linux-android -mattr=+neon"],

github.com

apache/incubator-tvm/blob/master/python/tvm/target.py#L472


This function will also download pre-tuned op parameters when there is none.


Parameters
----------
model: str
    SoC name or phone name of the arm board.
options : str or list of str
    Additional options
"""
trans_table = {
    "pixel2":    ["-model=snapdragon835", "-target=arm64-linux-android -mattr=+neon"],
    "mate10":    ["-model=kirin970", "-target=arm64-linux-android -mattr=+neon"],
    "mate10pro": ["-model=kirin970", "-target=arm64-linux-android -mattr=+neon"],
    "p20":       ["-model=kirin970", "-target=arm64-linux-android -mattr=+neon"],
    "p20pro":    ["-model=kirin970", "-target=arm64-linux-android -mattr=+neon"],
    "rasp3b":    ["-model=bcm2837", "-target=armv7l-linux-gnueabihf -mattr=+neon"],
    "rk3399":    ["-model=rk3399", "-target=aarch64-linux-gnu -mattr=+neon"],
    "pynq":      ["-model=pynq", "-target=armv7a-linux-eabi -mattr=+neon"],
    "ultra96":   ["-model=ultra96", "-target=aarch64-linux-gnu -mattr=+neon"],
}
pre_defined_opt = trans_table.get(model, ["-model=%s" % model])

ramana-arm · November 5, 2019, 1:20pm

Armv8-A has 2 execution modes, aarch64 and aarch32. Even though the Arm-ARM says Advanced SIMD is optional, on OS’s like linux and Android AFAIK, AArch64 (which matches arm64-linux-android or aarch64-linux-gnu) mandates the presence of FP and Advanced SIMD. Thus in the case of the aarch64 targets, I think it’s actually superfluous.

On AArch32 which has instructions in the 32 bit world for both A32 and T32 ISAs, you would need to put -mattr=+neon to ensure that Neon is generated as the defaults could well be conservative and there are options to have devices without FP or SIMD though that’s a bit rare in the Android world these days.

Ramana

janimesh · November 5, 2019, 8:54pm

@ramana-arm Do we also need a +vfp4 mattr flag for armv7?

apivovarov · November 5, 2019, 9:42pm

For armeabi-v7a Android platform we should consider the following options:

+neon
or +vfp3d16
or +vfp3,+d32
+thumb2

https://developer.android.com/ndk/guides/abis#sa

to get full list of LLLV LLC mcpu and mattr for arm run

llc -march=arm -mcpu=help

llc-arm-mcpu-mattr.out

Most popular TVM target line for ARMv7 Android phones is probably:

llvm -device=arm_cpu -target=armv7a-linux-androideabi -mfloat-abi=soft -mattr=+neon,+thumb2

Android Supported Instruction Sets

More TVM target lines for other Android architectures is here

ramana-arm · November 5, 2019, 9:36pm

Pedantically speaking, there is no armv7 , it’s either armv7-a or armv7ve. In the GCC world we’ve used -march=armv7 as really the subset of armv7-a, armv7-r and armv7-m which is thumb2, integer only and really a virtual ISA.

I suspect what you are refering to the Neon instruction set which is part of the Cortex-A15 which should be accessible with the -mfpu=neon-vfpv4 command line option. I don’t know if that is accessible with the magic -mattr command line option from llvm.

The best reference I have for this is a slightly older blog post written by a good friend of mine here in a GCC context ,https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/arm-cortex-a-processors-and-gcc-command-lines . I suspect the LLVM command lines are similar but I can easily check that with our friendly neighbourhood llvm team too.

regards
Ramana

apivovarov · November 5, 2019, 9:41pm

The full list of mcpu and mattr for arm for LLVM is here llc-arm-mcpu-mattr.out

we can provide comma separated list of features to -mattr
for neon-vfpv4 and thumb2 it will be -mattr=+neon,+vfp4,+thumb2

ramana-arm · November 5, 2019, 10:46pm

I see - llc seems to have different command line options to clang and has an interface that seems to allow users to add and remove options.

I can see that from the help output.

vfpv4 implies add VFPv4 instructions which for me is scalar FMA instructions.
neon implies add Neon instructions which to me refers to the original Advanced SIMD / Neon instructions as per the original Armv7-A instruction set.
neon-vfpv4 implies add Neon instructions which came in with the Neon unit that came in with VFPv4. Which implies for all practical purposes vectorized FMA instructions.

Thus for me given my knowledge of the ISA, -mattr=+neon,vfpv4 is confusing because it’s not obvious whether this is 1+2 above (I’m not aware of any actual implementation like this) or #3 . Trying out clang suggests this is actually #3 above.

Further the combination of -mfloat-abi=soft with this is more confusing because in other places -mfloat-abi=soft actually means use of software floating point emulation and essentially means don’t emit any actual fp or simd instructions.

If you were targeting the Cortex-A53 or really an Armv8-A cpu that supported AArch32 mode, I suspect what you need is -march=armv8-a -mattr=+neon,fp-armv8,thumb or some such as you’d get additional rounding instructions that came in armv8.

regards
Ramana

apivovarov · November 5, 2019, 10:57pm

About -mfloat-abi=soft - Android NDK libs for armeabi-v7a are soft.

More readings on it: https://android.googlesource.com/platform/ndk/+/ndk-r12-release/docs/HardFloatAbi.md

janimesh · November 5, 2019, 11:22pm

You managed to put all my concerns and confusions in one post very concisely

For now, I am interested in a Cortex-A53 CPU running on 32-bit OS (basically, Raspberry PI 3).

ramana-arm · November 5, 2019, 11:22pm

@apivovarov

The

is about why the Android world does not want to pass fp values in fp registers.

The -mfloat-abi=soft option to clang and -mfloat-abi=soft option to llc appear to mean different things. For llc it appears as though the --float-abi=soft/hard controls the parameter passing only and that is different from the clang option where -mfloat-abi is a tri-state where

-mfloat-abi=soft - software emulation for fp
-mfloat-abi=softfp -pass fp values in integer registers but produce FP and SIMD instructions (depending on -mfpu option)
-mfloat-abi=hard : pass fp values in fp registers and produce FP and SIMD instructions (depending on mfpu option)

regards
Ramana

ramana-arm · November 5, 2019, 11:32pm

Sorry - 32 bit OS is not precise enough

If vanilla ubuntu armhf - then -march=armv8-a -mfloat-abi=hard -mattr=+neon,fp-armv8,thumb2 should be a good starting point.
If Android - then -march=armv8-a -mfloat-abi=softfp -mattr=+neon,fp-armv8,thumb2 should be good.
If Raspbian - I think that should be similar to the Ubuntu armhf option.

Atleast

seems to assume no split between target architecture and actual OS under consideration.

HTH
Ramana