CUDNN tensorcore support has wrong results and strange timing for fp16 and int8

After PR #4353 we are able to run tensorcore based convolution using CUDNN in TVM for fp16 and int8. But when I run testing file test_cudnn.py, fp16 convolution gave me flaky wrong results sometimes and the timing is always -1ms. I wonder what’s the cause for the strange results. @Hzfengsy @masahi

Here’s the results when I ran verify_conv2d("float16", "float32", tensor_format=1) on Tesla T4 GPU: I changed the input shape as follows:

    in_channel = 512
    out_channel = 512
    filter_h = 3
    filter_w = 3
    pad_h = 1
    pad_w = 1
    stride_h = 1
    stride_w = 1
    dilation_h = 1
    dilation_w = 1
    batch = 1
    height = 7
    weight = 7

Sometimes, it gave me mismatch error as follows:

Mismatched elements: 1 / 25088 (0.00399%)
Max absolute difference: 8.17340421e-05
Max relative difference: 0.03795133
 x: array([[[[-13.311087,  26.494438, -25.143475, ...,  11.120489,
            0.849933,  -5.120694],
         [-10.676369, -19.9305  , -11.853168, ...,  -8.573727,...
 y: array([[[[-13.311075,  26.494409, -25.143463, ...,  11.120492,
            0.849936,  -5.120693],
         [-10.676372, -19.930494, -11.853153, ...,  -8.57373 ,...

When the results are correct, the timing is strange:

[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:344:  CUDNN Found 8 fwd algorithms, choosing CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347:          0) CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347:          1) CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347:          2) CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347:          3) CUDNN_CONVOLUTION_FWD_ALGO_GEMM - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347:          4) CUDNN_CONVOLUTION_FWD_ALGO_DIRECT - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347:          5) CUDNN_CONVOLUTION_FWD_ALGO_FFT - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347:          6) CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347:          7) CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD - time: -1 ms, Memory: 0

When I ran verify_conv2d("int8", "int32", tensor_format=1), no output info except:

/home/ubuntu/workplace/tvm-1/python/tvm/driver/build_module.py:259: UserWarning: Specified target cuda, but cannot find device code, did you do bind?
  "bind?" % target)

I have an access to RTX 2060, I tried test_conv2d() and I can reproduce issues you are seeing.

  • time: -1 ms when running verify_conv2d("float16", "float32", tensor_format=1)
  • No output other than UserWarning when running verify_conv2d("int8", "int32", tensor_format=1)

In the first case, the result looks legit. In the second case, the output y and the reference c_np are both all zeros, so this test is broken.

Thank you for the information. I can also reproduce both cases on RTX 2080Ti.

The second case caused by two mistakes:

After fixing that two bugs, the accuracy still has problem. Here is my test:

Mismatched elements: 21868 / 49152 (44.5%)
Max absolute difference: 559.
Max relative difference: 0.8148688
 x: array([[[[ -67,   47,   58, ...,   35,   81,   26],
         [ -16,  105, -128, ...,   39, -128,   62],
         [ 127,  102, -128, ...,    1,  -97, -128],...
 y: array([[[[ -67.,   47.,   58., ...,   35.,   81.,   26.],
         [ -16.,  105., -150., ...,   39., -178.,   62.],
         [ 234.,  102., -248., ...,    1.,  -97., -216.],...

I’m not sure which result is correct between tvm.topi and CuDNN. It would be you can also help to look at it.

@Hzfengsy When I revert back to your commit the results are correct even when I change the input in test_cudnn.py to generate input data from -128 to 127. So something has been broken after the commits from @optima2005. I’ll look deeper into this.

2 Likes

Yes, this must be this PR https://github.com/apache/incubator-tvm/pull/4418 by @optima2005

I remember @optima2005’s and @Hzfengsy’s PR on cuDNN change were happening around the same time. We (me and @optima2005) might have been sloppy on dtype handling. Our focus was unifying 2D and 3D implementation.

Has this bug been fixed?