[Hybrid] variables on CUDA should have 'local' scope

Hi @were

According to the doc,

Variables
Because there is no variables in HalideIR, all the mutatable variables will be lowered to an array with size 1. It takes the first store of a variable as its declaration.

When I write hybrid sciprt for CUDA backend, the intermediate variables are translated to vars with shape (1,) in global scope and caused error Check failed: scope != "global" (global vs. global)
in CodeGenCUDA::PrintStorageScope

A workaround might be using allocate to manually create some variable. But I think better we don’t need to create these variables if they only serve as intermediate result and we don’t write to them