- Strides are supported and tested via tvm-clj.
- Code is much more clear.
- double support as well as float.
I will do a bit more testing. I searched the codebase and it appears the gemm code was just cut/pasted for all the contrib matmul’s with little apparent effort to unify approaches.
I propose:
- Header file with unified approach that all implementations can use.
- Update all matmuls; they will then mainly just forward to unified approach with different gemm operators.
- Add support for optional alpha, beta parameters. Old code would continue to work with no changes; the argvec.
This would update all matmuls in contrib. I cannot test all those different options, however, so I would need help with that. I could test mkl, cpu, cuda, and if the intel or nvidia opencl implementations support it the rocblas pathway.
Does this sound like a good plan? This may allow tvm to be used in systems that, for instance, are using gemm to sum gradients into an accumulator (beta of 1).