I changed line 136 in the tensorize tutorial (https://github.com/dmlc/tvm/blob/f7beea4bcb396b02e9cb7d818d40be11ffe0651a/tutorials/language/tensorize.py) to:
extern "C" int gemv_update(float *cc, float *bb, float *aa, int m, int l, int stride) {
Note that I’ve just switched aa and bb. When I run this modified version, I get the following error:
AssertionError:
Not equal to tolerance rtol=0.001, atol=1e-07
x and y nan location mismatch:
x: array([[ 1.354377e+01, 1.394581e+01, 1.402019e+01, …,
1.696026e+01, 1.639488e+01, 1.942286e+01],
[ 1.394581e+01, 1.402019e+01, 1.325300e+01, …,…
y: array([[ 13.543765, 13.735253, 15.58291 , …, 15.913888, 14.673822,
15.38301 ],
[ 13.945811, 15.701975, 17.406521, …, 17.366894, 17.116966,…
It seems that the order of tensors returned by ComputeOpNode::InputTensors() for the compute on line 76 must match the order expected by the intrinsic’s C implementation on line 136. Is my understanding correct? If so, how should the user determine the correct order in general?