Will TVM’s current quantization support the following situation? If so, where can I find more info?
- Input model is float32
- All weights must be transformed into int8 fixed point format
- Fixed point parameters can be shared within a tensor, but do not need to be the same from tensor-to-tensor
- Data-aware calibration needs to find min/max values of accumulators and pass this info from relay down to code generation (what is the mechanism here?)