How to limit `VGPRs` and `SGPRs` for ROCm

How to limit VGPRs and SGPRs for ROCm?

It’s been a while, but when you set the maximum workgroup size it will cause LLVM to limit the number of registers so that the given maximum workgroup size can be accomodated. This PR sets this for the AMDGPU codegen:

@t-vi Thanks, so what is the user interface in TVM to limit them?

Last time I looked it was not configurable but the codegen will query the device properties.

Best regards