TF Lite quantized conv2d operator conversion

I’m interested in the story around quantized networks.

In option 1, term 1 newer Arm cores (Armv8.2-A optionally, Armv8.4-A onwards mandatorily) including Neoverse N1, Cortex-A55 and others implement a dot product instruction from uint8 x uint8 vectors to uint32 vectors. See references below for more. This instruction exists in the AArch32 world as well as the AArch64 world. There are also vector with single scalar versions available. Would these be useful here in accelerating conv2d or other schedules ?

I don’t believe the loop vectorizer vectorizes this in LLVM today and that would probably take quite a bit of work and even then it may not be able to get the right results.

From my understanding this is something that could be experimented with either using the
a. Inline assembler form for instructions
b. Lowering directly to an llvm intrinsic for the vdot instruction. Arm does publish something known as the ACLE for use within C and C++ applications.

References

  1. https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/exploring-the-arm-dot-product-instructions
  2. https://developer.arm.com/docs/ddi0596/latest/simd-and-floating-point-instructions-alphabetic-order/udot-vector-dot-product-unsigned-arithmetic-vector (Udot , vector by vector)
  3. https://developer.arm.com/docs/ddi0596/latest/simd-and-floating-point-instructions-alphabetic-order/udot-by-element-dot-product-unsigned-arithmetic-vector-by-element (udot vector by element)
  4. https://developer.arm.com/docs/ddi0596/latest/simd-and-floating-point-instructions-alphabetic-order/sdot-by-element-dot-product-signed-arithmetic-vector-by-element (Signed dot product by element)
  5. https://developer.arm.com/docs/ddi0596/latest/simd-and-floating-point-instructions-alphabetic-order/sdot-vector-dot-product-signed-arithmetic-vector
    (Signed dot product vector by vector)
  6. https://developer.arm.com/docs/ddi0597/latest/simd-and-floating-point-instructions-alphabetic-order/vsdot-by-element-dot-product-index-form-with-signed-integers
    (VSDOT for AArch32 i.e. 32 bit Arm instruction set in both T32 and A32 instruction sets)
  7. https://developer.arm.com/docs/ddi0597/latest/simd-and-floating-point-instructions-alphabetic-order for AArch32 and find VSDOT and VUDOT instructions.

I hope this helps.

Regards,
Ramana

1 Like

Of course. dot instruction can accelerate q_conv2d. Currently, we have to use SMLAL instruction. If LLVM can not handle it, we could do tensorize like x86 has done.

Thanks - My aim was to bring that into the design discussion now and draw attention to this feature.

Thanks @ramana-arm This is very helpful. TVM provides a feature to directly call the LLVM intrinsic as @FrozenGene mentioned.