CUDNN tensorcore support has wrong results and strange timing for fp16 and int8

Laurawly · March 5, 2020, 8:39pm

After PR #4353 we are able to run tensorcore based convolution using CUDNN in TVM for fp16 and int8. But when I run testing file test_cudnn.py, fp16 convolution gave me flaky wrong results sometimes and the timing is always -1ms. I wonder what’s the cause for the strange results. @Hzfengsy @masahi

Here’s the results when I ran verify_conv2d("float16", "float32", tensor_format=1) on Tesla T4 GPU: I changed the input shape as follows:

    in_channel = 512
    out_channel = 512
    filter_h = 3
    filter_w = 3
    pad_h = 1
    pad_w = 1
    stride_h = 1
    stride_w = 1
    dilation_h = 1
    dilation_w = 1
    batch = 1
    height = 7
    weight = 7

Sometimes, it gave me mismatch error as follows:

Mismatched elements: 1 / 25088 (0.00399%)
Max absolute difference: 8.17340421e-05
Max relative difference: 0.03795133
 x: array([[[[-13.311087,  26.494438, -25.143475, ...,  11.120489,
            0.849933,  -5.120694],
         [-10.676369, -19.9305  , -11.853168, ...,  -8.573727,...
 y: array([[[[-13.311075,  26.494409, -25.143463, ...,  11.120492,
            0.849936,  -5.120693],
         [-10.676372, -19.930494, -11.853153, ...,  -8.57373 ,...

When the results are correct, the timing is strange:

[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:344:  CUDNN Found 8 fwd algorithms, choosing CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347:          0) CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347:          1) CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347:          2) CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347:          3) CUDNN_CONVOLUTION_FWD_ALGO_GEMM - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347:          4) CUDNN_CONVOLUTION_FWD_ALGO_DIRECT - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347:          5) CUDNN_CONVOLUTION_FWD_ALGO_FFT - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347:          6) CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347:          7) CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD - time: -1 ms, Memory: 0

When I ran verify_conv2d("int8", "int32", tensor_format=1), no output info except:

/home/ubuntu/workplace/tvm-1/python/tvm/driver/build_module.py:259: UserWarning: Specified target cuda, but cannot find device code, did you do bind?
  "bind?" % target)

masahi · March 6, 2020, 11:01am

I have an access to RTX 2060, I tried test_conv2d() and I can reproduce issues you are seeing.

time: -1 ms when running verify_conv2d("float16", "float32", tensor_format=1)
No output other than UserWarning when running verify_conv2d("int8", "int32", tensor_format=1)

In the first case, the result looks legit. In the second case, the output y and the reference c_np are both all zeros, so this test is broken.

Hzfengsy · March 6, 2020, 2:29pm

Thank you for the information. I can also reproduce both cases on RTX 2080Ti.

The second case caused by two mistakes:

In the testcase(https://github.com/apache/incubator-tvm/blob/master/tests/python/contrib/test_cudnn.py#L69) we try to generate data from -1 to 1. Then if we cast to Int8, everything would be 0.
According to the CuDNN document, alpha and beta must be float type even though we use int type. We use the incorrect type in TVM (https://github.com/apache/incubator-tvm/blob/master/src/runtime/contrib/cudnn/conv_forward.cc#L168)

After fixing that two bugs, the accuracy still has problem. Here is my test:

Mismatched elements: 21868 / 49152 (44.5%)
Max absolute difference: 559.
Max relative difference: 0.8148688
 x: array([[[[ -67,   47,   58, ...,   35,   81,   26],
         [ -16,  105, -128, ...,   39, -128,   62],
         [ 127,  102, -128, ...,    1,  -97, -128],...
 y: array([[[[ -67.,   47.,   58., ...,   35.,   81.,   26.],
         [ -16.,  105., -150., ...,   39., -178.,   62.],
         [ 234.,  102., -248., ...,    1.,  -97., -216.],...

I’m not sure which result is correct between tvm.topi and CuDNN. It would be you can also help to look at it.

Laurawly · March 12, 2020, 2:42am

@Hzfengsy When I revert back to your commit the results are correct even when I change the input in test_cudnn.py to generate input data from -128 to 127. So something has been broken after the commits from @optima2005. I’ll look deeper into this.

masahi · March 12, 2020, 2:04am

Yes, this must be this PR https://github.com/apache/incubator-tvm/pull/4418 by @optima2005

I remember @optima2005’s and @Hzfengsy’s PR on cuDNN change were happening around the same time. We (me and @optima2005) might have been sloppy on dtype handling. Our focus was unifying 2D and 3D implementation.

blueskyltx · June 3, 2020, 8:40am

Has this bug been fixed?

masahi · November 29, 2021, 12:32am

Should be fixed by https://github.com/apache/tvm/pull/9600