Reason for the existence of env_threads

Hi all,

env_threads is used with the scan operation to ensure that a single kernel is generated so that persistent RNNs can be implemented in TVM. Why is a separate scheduling primitive needed for this? Should the same transform not be possible with a combination of loop fusion, interchange and binding to GPU threads?