I see we have prefetch
primitive in IR like Halide. Have we used prefetch
in our schedule ? like CPU or GPU. I see Halide use this in their example like this:
denoised.compute_at(processed, yi).store_at(processed, yo)
.prefetch(input, y, 2)
.fold_storage(y, 16)
.tile(x, y, x, y, xi, yi, 2*vec, 2)
.vectorize(xi)
.unroll(yi);
Thanks.