I’ve noticed that relay.Var is always used for model weights, even when those weights are declared as constants in the framework that they were imported from.
This seems problematic to me because:
- We aren’t able to apply constant folding to the model weights. Any normalization, scale, reshape or transpose on weights has to be recomputed for each inference. These operations on weights are not uncommon and can also be introduced by relay passes such as ConvertLayout.
- For external codegen, many 3rd party inference libraries require weights for conv, BN, dense, etc to be constant C arrays.
- It is confusing. If an input is a variable, it should be a Var. If it is constant, it should be a Const.
I’m also curious what exactly happens when a Var is designated as a “param”? Is it possible to apply constant folding to params?
Is it simply a matter of updating the frontend parsers to use relay.Constant? Are there some other reasons why Var was chosen to be used?
Thank you!