[Error] Float16 for Cuda with Autotvm

wda · November 5, 2019, 7:57am

Now, I convert the inceptionV3 model (relay model) to FP16 . I don’t convert dense and softmax ops.
It can run well on Tesla P100, but the performance is poor. inference time is about 70ms.
So I use AutoTVM to tune conv2d op.After tuning, I use the tune log to run model , but I meet the error :

File “/wda/tvm-fp16-pass/src/runtime/module_util.cc”, line 73
TVMError: Check failed: ret == 0 (-1 vs. 0) : Assert fail: (73 == int32(arg3.shape[2])), Argument arg3.shape[2] has an unsatisfied constraint

I found that this error raised by this op and the best config entity as follows:

{"i": ["cuda -model=unknown", "topi_nn_conv2d", [["TENSOR", [1, 80, 73, 73], "float16"], ["TENSOR", [192, 80, 3, 3], "float16"], [1, 1], [0, 0], [1, 1], "NCHW", "float16"], {}, ["conv2d", [1, 80, 73, 73, "float16"], [192, 80, 3, 3, "float16"], [1, 1], [0, 0], [1, 1], "NCHW", "float16"], {"i": 19814422, "t": "winograd", "c": null, "e": [["tile_b", "sp", [-1, 1, 1, 1]], ["tile_y", "sp", [-1, 2, 8, 3]], ["tile_x", "sp", [-1, 4, 36, 1]], ["tile_rc", "sp", [-1, 40]], ["auto_unroll_max_step", "ot", 128], ["unroll_explicit", "ot", 1]]}], "r": [[0.00015195903128991062], 0, 33.547781229019165, 1572904636.4375215], "v": 0.1}

@vinx13 @ibeltagy @hhhh @xyzhou @ydy @comaniac
Can you help me, thanks

xyzhou · November 5, 2019, 8:15am

I have this issue as well, for now, a work around is you can set opt_level to 2. This issues is still under investigation.
You can refer [Auto-tune] Error occurs during inference when using auto-tuned schedule for further updates.

ZhePang · November 5, 2019, 8:32am

Does TVM already support FP16 for GPU？Can I quantize the model to FP16 and auto-tune it？
Thx.

wda · November 5, 2019, 8:39am

Now, Master branch doesn’t support the FP16 for GPU. But you can use this commit

Thanks for @xyzhou

ZhePang · November 5, 2019, 8:47am

I will try it. Thanks a lot.

TSCC · October 9, 2023, 2:14pm

any updates on float 16?