Gemm gpu template?

zwang-asapp · August 16, 2019, 10:57pm

Hi all, is there an official gpu template for single precision gemm and half precision gemm? I see the template here https://github.com/dmlc/tvm/blob/master/topi/recipe/gemm/cuda_gemm_square.py but wonder what’s the best way to use autoTVM with it.

zwang-asapp · August 21, 2019, 5:05pm

I would assume you would just make a config object, and then replace all the splits and bindings with the appropriate calls in config. Is that right?

haichen · August 21, 2019, 6:55pm

We really need an autoTVM template for GPU GEMM. Could you help add it?

And yes, we usually just replace the splits and define them in the AutoTVM config, and sometimes also define the loop order, max unrolling in the config as well. I recommend to hardcode GPU binding in the template instead of adding it to config since it’s quite important and usually we just bind the outer loop to block and thread idx.

zwang-asapp · August 21, 2019, 10:22pm

Can people share their experiences with tuning GEMM on GPUs? Like realistically what kind of performance wrt cuBLAS should I expect.