VTA reording and tensorization

Hi

I have a question regarding VTA reordering and tensorization.

In VTA tutorial simple matrix multiply, there is one step to reorder the
computation by moving the outer reduction all the way out.
Then the axis[2] of C_buf is tensorized.

I am not sure why we need to do this. I commented out these lines
and the execution totally failed.

The computation has been tiled according to the VTA tiling size.
Why is this step necessary?

Thanks