Some questions about AMD platform

Harold-Zhang · December 7, 2018, 6:16am

As we know，the goal of TVM is to build an end-to-end system that supports multiple hardware platforms, such as NVIDIA, ARM, AMD and so on. However, TVM is not good enough on AMD compared with other platforms from the public data(Automatic Kernel Optimization for Deep Learning on All Hardware Platforms). Does anyone analyze the reason? If someone is interested in AMD, we can exchange more and maybe do something together.

masahi · December 7, 2018, 7:46am

yeah, my guess is LLVM’s AMDGPU backend is not mature enough to generate optimized asm (compared to NVCC). Note that MIOpen convolution is super-optimized, hand written asm (to walk around compiler limitation) written by people who best know their hardware. So I think AutoTVM result is very impressive nonetheless.

Harold-Zhang · December 7, 2018, 8:40am

Thanks for your reply! It is really useful for me.
Yeah, MIOpen is really outstanding on AMD GPU, but TVM is perfect on the other platforms. It is difficult to further improve TVM‘s performance on AMD GPU for now if the LLVM is the reason why TVM is not perfect enough on AMD GPU, am I right? I know you are really professional on AMD GPU backend, so could I ask you some advice about that what we should do to contribute to TVM especially on AMD GPU? I will be grateful for any reply!

masahi · December 7, 2018, 8:55am

well, I’m not professional on AMD GPU backend I just helped bring it up.

You are right, we are probably blocked by LLVM’s inefficient codegen, so it is very difficult to decide how to go from here. We could tweak convolution schedules for AMDGPU (currently we use the CUDA schedule as is), but then there is little resource on how to optimize for AMDGPU. I was told that they are working on documentation, but I don’t expect it to come soon.

If you are really interested in AMDGPU backend, you can join the “rocm” channel at https://tvmai.slack.com/ to know what happened since last year.

Harold-Zhang · December 7, 2018, 9:42am

Thank you! You are really warm-hearted. But I was not able to access this website. I am not a member from @cmu.edu, @cs.washington.edu or @berkeley.edu. That’s terrible for me. Anyway, thank you very much!

masahi · December 7, 2018, 10:02am

Me neither. You can send email to @tqchen and he can invite you. See this. https://github.com/dmlc/tvm/issues/2174

Harold-Zhang · December 7, 2018, 10:14am

Thanks a million! I will send the email right away. I hope that we can communicate more in the future.

tqchen · December 7, 2018, 2:37pm

As of yesterday, @masahi has becomes a committer, also have the permission to invite people as well I will send out a slack invite. However, let us do not use slack channel for general discussion, and always use forum for that. I will summarize my current view of AMD pictures here in this thread

tqchen · December 7, 2018, 2:42pm

cc @ adityaatluri , and see also

To summarize my view on how can we improve AMD perf. I personally think that there is still ways to improving things. Even if LLVM is the problem, we can work around that via inline asm kernels via tensorization https://docs.tvm.ai/tutorials/language/tensorize.html#sphx-glr-tutorials-language-tensorize-py

Let us coordinate discussions in the discuss forum