@egy,
Now we have: Body, Zero, Update
- Body (mandatory): does computation with initial init to zero of the accumulators.
- Zero (can be None): only init to zero the accumulators, no computation.
- Update (can be None): does computation without any init (only accumulate).
Cases:
- In case Zero=None then a Body() followed Update() will be issued.
- In case Update=None then only Body() is used everywhere.
See also: Update rule for tensorize
Question:
May I implement a separate Store() (as optional 4-th) in a PR (+ reflecting testcases) ?
Imagine a HW that would need a separate Store step as final nail-in (from hidden accumulators) to a final memory destination.
I think it would be useful for many HW, at this moment I need it for tensorization in MARLANN.