Using external lib (cuDNN) with INT8 ops

eqy · February 27, 2019, 1:10am

Do we currently support this use-case? It looks like the operator selection/profiling part of cuDNN works, but when running I get CUDNN_STATUS_BAD_PARAM. CC @masahi

masahi · February 27, 2019, 1:48am

no, it looks like we are passing hard coded CUDNN_DATA_FLOAT as dtype to cuDNN calls I’ll fix this soon.

masahi · February 27, 2019, 5:45am

@eqy is cuDNN with Int8 supposed to work on a pascal card? It seems I can’t even create a conv descriptor when dtype is Int8 (cudnnSetConvolution2dDescriptor). algo selection is done on fp32 (hard coded, need to be fixed), with int8 nothing works.

eqy · February 27, 2019, 8:05am

I do not know if cuDNN works, but tuning with TVM schedules works (I just wanted to try something quickly before doing tuning). I think the TVM performance is reasonable (~35TOP/s at batch size 128 on 1080 Ti), so it is not urgent.

We did recently get some 2080 Ti cards too though

zhigaowu · November 12, 2019, 9:25am

any updates here? i got the same issue with int8 quantization model. and the TVM version is 0.6(just one month ahead, not the latest)

zhigaowu · November 12, 2019, 8:30am

tvm\src\contrib\cudnn\conv_forward.cc:69: Check failed: e == CUDNN_STATUS_SUCCESS (3 vs. 0) : cuDNN: CUDNN_STATUS_BAD_PARAM

TVM_REGISTER_GLOBAL("tvm.contrib.cudnn.conv2d.forward")
.set_body([](TVMArgs args, TVMRetValue *ret) {
  int mode = args[0];
  int format = args[1];
  int algo = args[2];
  int pad_h = args[3];
  int pad_w = args[4];
  int stride_h = args[5];
  int stride_w = args[6];
  int dilation_h = args[7];
  int dilation_w = args[8];
  DLTensor *x = args[9];
  DLTensor *w = args[10];
  DLTensor *y = args[11];
  CuDNNThreadEntry* entry_ptr = CuDNNThreadEntry::ThreadLocal();
  // Set Mode
  entry_ptr->conv_entry.mode = static_cast<cudnnConvolutionMode_t>(mode);
  // Set Format
  entry_ptr->conv_entry.tensor_format = static_cast<cudnnTensorFormat_t>(format);
  // Set Algo
  entry_ptr->conv_entry.fwd_algo = static_cast<cudnnConvolutionFwdAlgo_t>(algo);
  // Set Ctx
  entry_ptr->conv_entry.ctx = x->ctx;
  // Set Data Type
  entry_ptr->conv_entry.data_type = CuDNNDataType::DLTypeToCuDNNType(x->dtype);
  // Set Desc
  CUDNN_CALL(cudnnSetConvolution2dDescriptor(entry_ptr->conv_entry.conv_desc,
                                             pad_h,
                                             pad_w,
                                             stride_h,
                                             stride_w,
                                             dilation_h,
                                             dilation_w,
                                             entry_ptr->conv_entry.mode,
                                             entry_ptr->conv_entry.data_type));

line 69 is the call of cudnnSetConvolution2dDescriptor.

Hzfengsy · November 12, 2019, 9:12pm

I’m trying to add TensorCore support for cuDNN and cublas backend. I have found and fixed it at my local branch. I will submit a PR soon. Please wait some more time.

zhigaowu · November 13, 2019, 5:47am

that would be great work