Error in tensorize tutorial with small change


#1

I changed line 136 in the tensorize tutorial (https://github.com/dmlc/tvm/blob/f7beea4bcb396b02e9cb7d818d40be11ffe0651a/tutorials/language/tensorize.py) to:

extern "C" int gemv_update(float *cc, float *bb, float *aa, int m, int l, int stride) {

Note that I’ve just switched aa and bb. When I run this modified version, I get the following error:

AssertionError:
Not equal to tolerance rtol=0.001, atol=1e-07

x and y nan location mismatch:
x: array([[ 1.354377e+01, 1.394581e+01, 1.402019e+01, …,
1.696026e+01, 1.639488e+01, 1.942286e+01],
[ 1.394581e+01, 1.402019e+01, 1.325300e+01, …,…
y: array([[ 13.543765, 13.735253, 15.58291 , …, 15.913888, 14.673822,
15.38301 ],
[ 13.945811, 15.701975, 17.406521, …, 17.366894, 17.116966,…

It seems that the order of tensors returned by ComputeOpNode::InputTensors() for the compute on line 76 must match the order expected by the intrinsic’s C implementation on line 136. Is my understanding correct? If so, how should the user determine the correct order in general?


#2

In general, matrix multiplication is not commutative, so this restriction makes sense.


#3

Thanks for the reply. Yes, I agree that it is important to get the order of aa and bb correct, since matrix multiplication is not commutative. My question is how do you know the correct order in more complicated cases?

For example, suppose this is the pattern of computation that should be tensorized:

A = tvm.placeholder((10, ), name='A')
# Similarly, create placeholders for B, C, D, E, F
G = tvm.compute((10, ), lambda i : F[i]*(A[i]+B[i]+C[i]+D[i]+E[i]), name='G')

In the above case, in what order must the hw intrinsic accept its parameters?
Is the order just determined by scanning left to right in the lambda expression, i.e., F,A,B,C,D,E?
Or is it determined by the expression’s AST, in which case A,B,C,D,E,F would be the order?
Since the order is not provided anywhere explicitly, it can be difficult for the user to get right!


#4

I am confused by this question; the AST does not need to be considered in this case. As every piece of tensorized code has a backing implementation, you must match the ordering and semantics defined in the backing implementation. This seems to be just an issue of documenting each intrinsic specification thoroughly.


#5

Yes, I hope we can get on the same page! I appreciate your help.

Note that if you take the modified version of the tutorial, that gives the error, it can be fixed by modifying lines 95 and 96 from:

95                                 aa.access_ptr("r"),
96                                 bb.access_ptr("r"),

to:
95 bb.access_ptr(“r”),
96 aa.access_ptr(“r”),

and similarly for lines 242 and 243. This compensates for the original change in line 136. Running the program now succeeds without error.

However, note I could only apply this fix because I know the order of tensors in ‘ins’. If there were more than 2 tensor inputs, how would I know their order in ‘ins’? I have been trying to answer this question for myself, by stepping through the python and C++ code. What I found is that ‘ins’ is populated by a PostOrderVisit traversal of the lambda expression on line 76. Is that correct? Is this documented anywhere, and is it possible it will change in future (which could break existing programs)?


#6

Okay, I think I understand what you are asking now. In practice most intrinsics do not take many tensors as input, and intuition is good enough to know the order of the arguments (e.g., a first then b for the matmul example). However, if you are ever unsure, as you can give a name to placeholders in TVM, printing ins will give you the exact order of tensors in ins.


#7

Ok, that works! I guess it was a bit counterintuitive to me that you need to actually run the program in order to get this information. Thanks for your help, in thinking about it a different way.


#8

@eqy, thanks for looking into this.
Isn’t it better to explicitly fix the parameter order of the intrinsic, for example, similar to how function signatures do that in C++? This will make the order independent of what the lambda expression looks like.