Where did cache_read local copy data to?

junrushao · November 27, 2018, 8:53pm

On CUDA, what’s the benefit of using cache_read loading something to local memory? I assume accessing local memory is as slow as global memory, because both of them resides in off-chip DRAM.

(I copy-pasted the question I asked privately and the answer I got to the forum so other could benefit from it, because it is a clarification question)

junrushao · November 27, 2018, 8:54pm

The answer I got:

“local roughly means register, because compiler will lift it.”

junrushao · November 27, 2018, 8:58pm

We know that if the size of the local memory is determined, and it is small enough for register, nvcc is capable of lift it to registers. But if the local memory is large or cannot be determined in compilation, it might lead to spills to cache or off-chip DRAM.