Implement conv2d with tensor core-2nd question

zhoujqhappy · December 10, 2018, 9:18am

In previous RFC (Implement Conv2d using Tensor Core), I expressed my intention of implement a convolution layer with tensor core in tvm. By adding new intrinsic about the wmma APIs, now I get rid of all these APIs in my own head file(I used to run my program by hacking into codegen_cuda.cc and add one Cpp head file.) However, I still need to integrate 3 more function into tvm, they are used for efficiently**1. move data from global memory to shared memory 2.perform the task of im2col 3.move computation result back from shared memory to global memory.**These 3 functions are very complicated and can hardly be written using schedule or ir, while they are also efficient, my single layer conv2d performance is 90%-130% compared with convolution of tensor-rt(also using tensor core) .

However, these functions are all device function and are now called in my cuda kernel conducting the convolution, thus I can not simply integrate them into packed functions(because they are not host functions). Intrinsic functions may also not be a good idea, because that will result in several hundred lines of string in the codegen_c file. The best possible way I can think of is adding a flag in the builder, and we put this headfile in a specific folder inside tvm. Once the flag is set to true, the codegen_cuda.cc file will include my head file into the generated code.(This is also a very dum idea, I guess, anyway)
@vinx13 @tqchen

zhoujqhappy · December 10, 2018, 9:19am

cc @merrymercy @masahi

merrymercy · December 11, 2018, 3:56am

I think this method is okay.

We do something similar for int8 and fp16 header files.

github.com

dmlc/tvm/blob/cb70da1ba8382c1ed2762c6fca3046884986a3fd/src/codegen/codegen_cuda.cc#L33


vid_global_barrier_expect_ = GetUniqueName("__barrier_expect");
CHECK_EQ(vid_global_barrier_state_, runtime::symbol::tvm_global_barrier_state);
}


void CodeGenCUDA::AddFunction(LoweredFunc f) {
this->stream << "extern \"C\" __global__ ";
CodeGenC::AddFunction(f);
}


std::string CodeGenCUDA::Finish() {
if (enable_fp16_) {
  decl_stream << "#include <cuda_fp16.h>\n";
}


if (enable_int8_) {
  decl_stream << "#include <sm_61_intrinsics.h>\n";
}


return CodeGenC::Finish();
}