[Hybrid] variables on CUDA should have 'local' scope


Hi @were

According to the doc,

Because there is no variables in HalideIR, all the mutatable variables will be lowered to an array with size 1. It takes the first store of a variable as its declaration.

When I write hybrid sciprt for CUDA backend, the intermediate variables are translated to vars with shape (1,) in global scope and caused error Check failed: scope != "global" (global vs. global)
in CodeGenCUDA::PrintStorageScope

A workaround might be using allocate to manually create some variable. But I think better we don’t need to create these variables if they only serve as intermediate result and we don’t write to them