Code examples on prefetch?



Any code examples on how to use the prefetch functionality? I find the description, but I’d like to know how to use it and in which cases they are beneficial.

I tried to apply it, but it doesn’t seem to help.



@were He is the author of prefetch


I have the same question. Was there a response somewhere from @were ?


Can you elaborate your question more?

The key to use prefetch is the distance.
If the distance is too large, the value fetched will be flushed before actually using.
If the distance is too small, the value fetching request is already ongoing but not responded.

However, in most cases, it just slows down the program, because it needs to issue one more instruction.
Some CPU just regard it as a noop.


Thank you for your response.

The target processor (Hexagon) has both a DSP and a vector multiplier (HVX), both of which benefited from pre-fetching in Halide, doing convolutions for example. It would be nice to use the TVM-based function to do something similar.

Yes, the distance will be critical. The folks working on Halide found behavior as you describe. But they found a significant seed-up for well-chosen parameters.

I have not yet looked at the timing improvements using (your?) TVM prefetch() function. I am still trying to work out how it behaves. I see for example that if s is a schedule built at B, and
where either A == B or B is contingent on A, then the TVM IR is of the form
prefetch(some address in A, 0, 3, 1).

Can you tell me how the TVM-level call is translated into this IR (depending on the relationship between A and B)? And what does the 0,3,1 represent?