[IR] Have we used prefetch in our schedule?


I see we have prefetch primitive in IR like Halide. Have we used prefetch in our schedule ? like CPU or GPU. I see Halide use this in their example like this:

        denoised.compute_at(processed, yi).store_at(processed, yo)
            .prefetch(input, y, 2)
            .fold_storage(y, 16)
            .tile(x, y, x, y, xi, yi, 2*vec, 2)

@tqchen @merrymercy



Many architechtures like armv8 has auto prefetch, using apparent prefetch instructions like pld could reverse performance, maybe using it with define_knob as a autotvm config.