(migrating issue #1021 here) @adityaatluri
To summarize, it looks like the MIOpen implementation for the first layer of resnet (7x7 conv) uses SGPRs + vector operations to do batching efficiently (sharing filter weights in SGPRs across a workgroup).
In OpenCL this is done with the __constant
qualifier, but it seems that this is not currently supported in tvm (for the rocm backend). We most recently were wondering if this support could be possible using an llvm memory scope (which seems to cover the __constant
case).
Any other ideas on possible support for this case? Note that this technique is used for the batched inference case, not batch_size=1