AutoTVM tutorial produces 21 conv2d schedules for "llvm" and only 12 conv2d schedules for "cuda"

Hi eveyrone,

When I run the tutorial “Auto-tuning a convolutional network for x86 CPU” I see 21 lines for the 21 convolutions (ResNet-18) in the log_optimal. However, when I run the tutorial “Auto-tuning a convolutional network for NVIDIA GPU” I get only 12 lines for 12 convolutions in the log_optimal, even though it is the same network (or that is what I understand), does anybody know why this could happen?

In addition, do you know why is it that you do not get any schedules for the other layers? Are the other layers not being tuned, or maybe they get fused with the convolutions?

I would appreciate any help regarding this issue

Could you post the workloads that only appear at the LLVM target?

Hi, Thanks a lot for your prompt response. I have modified the description above to be more specific about the tutorials I am running.

Please find the optimal configuration for “llvm” (resnet-18) below. I have enumerated the layers for convenience. One difference for instance, is that for “llvm” there are 3 schedules corresponding to 3 convolutions of size [1 512 7 7], whereas in the cuda log, as shown in my reply below this one, there is only 1 schedule for 1 convolution of size [1 512 7 7].

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 3, 224, 224], “float32”], [“TENSOR”, [64, 3, 7, 7], “float32”], [2, 2], [3, 3, 3, 3], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 3, 224, 224, “float32”], [64, 3, 7, 7, “float32”], [2, 2], [3, 3, 3, 3], [1, 1], “NCHW”, “float32”], {“i”: 136, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 1]], [“tile_oc”, “sp”, [-1, 32]], [“tile_ow”, “sp”, [-1, 1]], [“unroll_kw”, “ot”, false]]}], “r”: [[0.01014757672815534], 0, 2.5155513286590576, 1581436858.4419928], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 64, 56, 56], “float32”], [“TENSOR”, [64, 64, 3, 3], “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 64, 56, 56, “float32”], [64, 64, 3, 3, “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“i”: 82, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 32]], [“tile_oc”, “sp”, [-1, 16]], [“tile_ow”, “sp”, [-1, 2]], [“unroll_kw”, “ot”, true]]}], “r”: [[0.009950462727272727], 0, 2.7125132083892822, 1581437816.7206779], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 64, 56, 56], “float32”], [“TENSOR”, [64, 64, 3, 3], “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 64, 56, 56, “float32”], [64, 64, 3, 3, “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“i”: 82, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 32]], [“tile_oc”, “sp”, [-1, 16]], [“tile_ow”, “sp”, [-1, 2]], [“unroll_kw”, “ot”, true]]}], “r”: [[0.009950462727272727], 0, 2.7125132083892822, 1581437816.7206779], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 64, 56, 56], “float32”], [“TENSOR”, [64, 64, 1, 1], “float32”], [1, 1], [0, 0, 0, 0], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 64, 56, 56, “float32”], [64, 64, 1, 1, “float32”], [1, 1], [0, 0, 0, 0], [1, 1], “NCHW”, “float32”], {“i”: 425, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 32]], [“tile_oc”, “sp”, [-1, 16]], [“tile_ow”, “sp”, [-1, 1]], [“tile_oh”, “ot”, 2]]}], “r”: [[0.0011312621484992102], 0, 3.207817316055298, 1581441951.768949], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 64, 56, 56], “float32”], [“TENSOR”, [64, 64, 3, 3], “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 64, 56, 56, “float32”], [64, 64, 3, 3, “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“i”: 82, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 32]], [“tile_oc”, “sp”, [-1, 16]], [“tile_ow”, “sp”, [-1, 2]], [“unroll_kw”, “ot”, true]]}], “r”: [[0.009950462727272727], 0, 2.7125132083892822, 1581437816.7206779], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 64, 56, 56], “float32”], [“TENSOR”, [64, 64, 3, 3], “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 64, 56, 56, “float32”], [64, 64, 3, 3, “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“i”: 82, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 32]], [“tile_oc”, “sp”, [-1, 16]], [“tile_ow”, “sp”, [-1, 2]], [“unroll_kw”, “ot”, true]]}], “r”: [[0.009950462727272727], 0, 2.7125132083892822, 1581437816.7206779], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 64, 56, 56], “float32”], [“TENSOR”, [128, 64, 3, 3], “float32”], [2, 2], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 64, 56, 56, “float32”], [128, 64, 3, 3, “float32”], [2, 2], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“i”: 89, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 32]], [“tile_oc”, “sp”, [-1, 16]], [“tile_ow”, “sp”, [-1, 2]], [“unroll_kw”, “ot”, true]]}], “r”: [[0.005048907204472844], 0, 4.021136522293091, 1581443233.6869035], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 128, 28, 28], “float32”], [“TENSOR”, [128, 128, 3, 3], “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 128, 28, 28, “float32”], [128, 128, 3, 3, “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“i”: 101, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 32]], [“tile_oc”, “sp”, [-1, 16]], [“tile_ow”, “sp”, [-1, 2]], [“unroll_kw”, “ot”, true]]}], “r”: [[0.009989094614814816], 0, 3.1271796226501465, 1581446749.781529], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 64, 56, 56], “float32”], [“TENSOR”, [128, 64, 1, 1], “float32”], [2, 2], [0, 0, 0, 0], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 64, 56, 56, “float32”], [128, 64, 1, 1, “float32”], [2, 2], [0, 0, 0, 0], [1, 1], “NCHW”, “float32”], {“i”: 368, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 16]], [“tile_oc”, “sp”, [-1, 16]], [“tile_ow”, “sp”, [-1, 1]], [“tile_oh”, “ot”, 2]]}], “r”: [[0.0005921782860057119], 0, 3.293405532836914, 1581446003.794353], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 128, 28, 28], “float32”], [“TENSOR”, [128, 128, 3, 3], “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 128, 28, 28, “float32”], [128, 128, 3, 3, “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“i”: 101, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 32]], [“tile_oc”, “sp”, [-1, 16]], [“tile_ow”, “sp”, [-1, 2]], [“unroll_kw”, “ot”, true]]}], “r”: [[0.009989094614814816], 0, 3.1271796226501465, 1581446749.781529], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 128, 28, 28], “float32”], [“TENSOR”, [128, 128, 3, 3], “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 128, 28, 28, “float32”], [128, 128, 3, 3, “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“i”: 101, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 32]], [“tile_oc”, “sp”, [-1, 16]], [“tile_ow”, “sp”, [-1, 2]], [“unroll_kw”, “ot”, true]]}], “r”: [[0.009989094614814816], 0, 3.1271796226501465, 1581446749.781529], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 128, 28, 28], “float32”], [“TENSOR”, [256, 128, 3, 3], “float32”], [2, 2], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 128, 28, 28, “float32”], [256, 128, 3, 3, “float32”], [2, 2], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“i”: 109, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 32]], [“tile_oc”, “sp”, [-1, 16]], [“tile_ow”, “sp”, [-1, 2]], [“unroll_kw”, “ot”, true]]}], “r”: [[0.005018088178321678], 0, 3.9786384105682373, 1581449992.5711432], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 256, 14, 14], “float32”], [“TENSOR”, [256, 256, 3, 3], “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 256, 14, 14, “float32”], [256, 256, 3, 3, “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“i”: 188, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 256]], [“tile_oc”, “sp”, [-1, 4]], [“tile_ow”, “sp”, [-1, 7]], [“unroll_kw”, “ot”, true]]}], “r”: [[0.009956693457142857], 0, 3.0830090045928955, 1581453318.564188], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 128, 28, 28], “float32”], [“TENSOR”, [256, 128, 1, 1], “float32”], [2, 2], [0, 0, 0, 0], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 128, 28, 28, “float32”], [256, 128, 1, 1, “float32”], [2, 2], [0, 0, 0, 0], [1, 1], “NCHW”, “float32”], {“i”: 324, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 16]], [“tile_oc”, “sp”, [-1, 16]], [“tile_ow”, “sp”, [-1, 1]], [“tile_oh”, “ot”, 2]]}], “r”: [[0.0005877056827348746], 0, 3.120971918106079, 1581451718.8096752], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 256, 14, 14], “float32”], [“TENSOR”, [256, 256, 3, 3], “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 256, 14, 14, “float32”], [256, 256, 3, 3, “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“i”: 188, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 256]], [“tile_oc”, “sp”, [-1, 4]], [“tile_ow”, “sp”, [-1, 7]], [“unroll_kw”, “ot”, true]]}], “r”: [[0.009956693457142857], 0, 3.0830090045928955, 1581453318.564188], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 256, 14, 14], “float32”], [“TENSOR”, [256, 256, 3, 3], “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 256, 14, 14, “float32”], [256, 256, 3, 3, “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“i”: 188, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 256]], [“tile_oc”, “sp”, [-1, 4]], [“tile_ow”, “sp”, [-1, 7]], [“unroll_kw”, “ot”, true]]}], “r”: [[0.009956693457142857], 0, 3.0830090045928955, 1581453318.564188], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 256, 14, 14], “float32”], [“TENSOR”, [512, 256, 3, 3], “float32”], [2, 2], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 256, 14, 14, “float32”], [512, 256, 3, 3, “float32”], [2, 2], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“i”: 305, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 256]], [“tile_oc”, “sp”, [-1, 8]], [“tile_ow”, “sp”, [-1, 7]], [“unroll_kw”, “ot”, false]]}], “r”: [[0.005143994603278689], 0, 4.259227514266968, 1581454611.755201], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 512, 7, 7], “float32”], [“TENSOR”, [512, 512, 3, 3], “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 512, 7, 7, “float32”], [512, 512, 3, 3, “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“i”: 128, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 256]], [“tile_oc”, “sp”, [-1, 4]], [“tile_ow”, “sp”, [-1, 7]], [“unroll_kw”, “ot”, true]]}], “r”: [[0.009949430639639639], 0, 2.7014334201812744, 1581456113.8993032], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 256, 14, 14], “float32”], [“TENSOR”, [512, 256, 1, 1], “float32”], [2, 2], [0, 0, 0, 0], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 256, 14, 14, “float32”], [512, 256, 1, 1, “float32”], [2, 2], [0, 0, 0, 0], [1, 1], “NCHW”, “float32”], {“i”: 223, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 128]], [“tile_oc”, “sp”, [-1, 16]], [“tile_ow”, “sp”, [-1, 1]], [“tile_oh”, “ot”, 2]]}], “r”: [[0.0005641878775181305], 0, 3.1547939777374268, 1581454752.9796696], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 512, 7, 7], “float32”], [“TENSOR”, [512, 512, 3, 3], “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 512, 7, 7, “float32”], [512, 512, 3, 3, “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“i”: 128, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 256]], [“tile_oc”, “sp”, [-1, 4]], [“tile_ow”, “sp”, [-1, 7]], [“unroll_kw”, “ot”, true]]}], “r”: [[0.009949430639639639], 0, 2.7014334201812744, 1581456113.8993032], “v”: 0.1}

{“i”: [“llvm”, “topi_x86_conv2d_NCHWc”, [[“TENSOR”, [1, 512, 7, 7], “float32”], [“TENSOR”, [512, 512, 3, 3], “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 512, 7, 7, “float32”], [512, 512, 3, 3, “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“i”: 128, “c”: null, “t”: “direct”, “e”: [[“tile_ic”, “sp”, [-1, 256]], [“tile_oc”, “sp”, [-1, 4]], [“tile_ow”, “sp”, [-1, 7]], [“unroll_kw”, “ot”, true]]}], “r”: [[0.009949430639639639], 0, 2.7014334201812744, 1581456113.8993032], “v”: 0.1}

For CUDA, same resnet-18. It only returns schedule for 12 layers (I wonder why). Inference time is very close to what is shown in the website (1.15ms)

{“r”: [[5.301165607765277e-05], 0, 3.416822910308838, 1581603631.1435277], “v”: 0.1, “i”: [“cuda -model=unknown”, “topi_nn_conv2d”, [[“TENSOR”, [1, 512, 7, 7], “float32”], [“TENSOR”, [512, 512, 3, 3], “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 512, 7, 7, “float32”], [512, 512, 3, 3, “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“e”: [[“tile_b”, “sp”, [-1, 1, 1, 1]], [“tile_y”, “sp”, [-1, 1, 16, 4]], [“tile_x”, “sp”, [-1, 1, 8, 2]], [“tile_rc”, “sp”, [-1, 16]], [“auto_unroll_max_step”, “ot”, 1500], [“unroll_explicit”, “ot”, 0]], “c”: null, “t”: “winograd”, “i”: 190206}]}

{“r”: [[1.3176676375972394e-05], 0, 36.70834970474243, 1581605845.2810543], “v”: 0.1, “i”: [“cuda -model=unknown”, “topi_nn_conv2d”, [[“TENSOR”, [1, 256, 14, 14], “float32”], [“TENSOR”, [512, 256, 1, 1], “float32”], [2, 2], [0, 0, 0, 0], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 256, 14, 14, “float32”], [512, 256, 1, 1, “float32”], [2, 2], [0, 0, 0, 0], [1, 1], “NCHW”, “float32”], {“e”: [[“tile_f”, “sp”, [-1, 2, 16, 1]], [“tile_y”, “sp”, [-1, 1, 1, 1]], [“tile_x”, “sp”, [-1, 1, 7, 1]], [“tile_rc”, “sp”, [-1, 16]], [“tile_ry”, “sp”, [-1, 1]], [“tile_rx”, “sp”, [-1, 1]], [“auto_unroll_max_step”, “ot”, 1500], [“unroll_explicit”, “ot”, 0]], “c”: null, “t”: “direct”, “i”: 79235}]}

{“r”: [[8.606766642780366e-05], 0, 3.5750463008880615, 1581607432.4720902], “v”: 0.1, “i”: [“cuda -model=unknown”, “topi_nn_conv2d”, [[“TENSOR”, [1, 256, 14, 14], “float32”], [“TENSOR”, [512, 256, 3, 3], “float32”], [2, 2], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 256, 14, 14, “float32”], [512, 256, 3, 3, “float32”], [2, 2], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“e”: [[“tile_f”, “sp”, [-1, 2, 4, 2]], [“tile_y”, “sp”, [-1, 1, 7, 1]], [“tile_x”, “sp”, [-1, 1, 1, 1]], [“tile_rc”, “sp”, [-1, 8]], [“tile_ry”, “sp”, [-1, 3]], [“tile_rx”, “sp”, [-1, 3]], [“auto_unroll_max_step”, “ot”, 512], [“unroll_explicit”, “ot”, 1]], “c”: null, “t”: “direct”, “i”: 612993}]}

{“r”: [[3.844542276161389e-05], 0, 31.977830171585083, 1581609095.2137506], “v”: 0.1, “i”: [“cuda -model=unknown”, “topi_nn_conv2d”, [[“TENSOR”, [1, 256, 14, 14], “float32”], [“TENSOR”, [256, 256, 3, 3], “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 256, 14, 14, “float32”], [256, 256, 3, 3, “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“e”: [[“tile_b”, “sp”, [-1, 1, 1, 1]], [“tile_y”, “sp”, [-1, 2, 16, 2]], [“tile_x”, “sp”, [-1, 7, 7, 1]], [“tile_rc”, “sp”, [-1, 16]], [“auto_unroll_max_step”, “ot”, 1500], [“unroll_explicit”, “ot”, 0]], “c”: null, “t”: “winograd”, “i”: 37032}]}

{“r”: [[7.893610215623896e-06], 0, 10.36836051940918, 1581611070.8119206], “v”: 0.1, “i”: [“cuda -model=unknown”, “topi_nn_conv2d”, [[“TENSOR”, [1, 128, 28, 28], “float32”], [“TENSOR”, [256, 128, 1, 1], “float32”], [2, 2], [0, 0, 0, 0], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 128, 28, 28, “float32”], [256, 128, 1, 1, “float32”], [2, 2], [0, 0, 0, 0], [1, 1], “NCHW”, “float32”], {“e”: [[“tile_f”, “sp”, [-1, 4, 16, 1]], [“tile_y”, “sp”, [-1, 1, 1, 1]], [“tile_x”, “sp”, [-1, 1, 14, 1]], [“tile_rc”, “sp”, [-1, 16]], [“tile_ry”, “sp”, [-1, 1]], [“tile_rx”, “sp”, [-1, 1]], [“auto_unroll_max_step”, “ot”, 1500], [“unroll_explicit”, “ot”, 0]], “c”: null, “t”: “direct”, “i”: 865952}]}

{“r”: [[4.448707873573967e-05], 0, 38.539387226104736, 1581613358.0153213], “v”: 0.1, “i”: [“cuda -model=unknown”, “topi_nn_conv2d”, [[“TENSOR”, [1, 128, 28, 28], “float32”], [“TENSOR”, [256, 128, 3, 3], “float32”], [2, 2], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 128, 28, 28, “float32”], [256, 128, 3, 3, “float32”], [2, 2], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“e”: [[“tile_f”, “sp”, [-1, 2, 8, 1]], [“tile_y”, “sp”, [-1, 1, 2, 1]], [“tile_x”, “sp”, [-1, 1, 7, 2]], [“tile_rc”, “sp”, [-1, 16]], [“tile_ry”, “sp”, [-1, 3]], [“tile_rx”, “sp”, [-1, 3]], [“auto_unroll_max_step”, “ot”, 1500], [“unroll_explicit”, “ot”, 1]], “c”: null, “t”: “direct”, “i”: 7970845}]}

{“r”: [[3.138099692142953e-05], 0, 13.580451726913452, 1581616193.9938211], “v”: 0.1, “i”: [“cuda -model=unknown”, “topi_nn_conv2d”, [[“TENSOR”, [1, 128, 28, 28], “float32”], [“TENSOR”, [128, 128, 3, 3], “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 128, 28, 28, “float32”], [128, 128, 3, 3, “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“e”: [[“tile_b”, “sp”, [-1, 1, 1, 1]], [“tile_y”, “sp”, [-1, 1, 8, 4]], [“tile_x”, “sp”, [-1, 7, 28, 1]], [“tile_rc”, “sp”, [-1, 32]], [“auto_unroll_max_step”, “ot”, 128], [“unroll_explicit”, “ot”, 1]], “c”: null, “t”: “winograd”, “i”: 447559}]}

{“r”: [[6.707924643584522e-06], 0, 15.57893967628479, 1581618869.209506], “v”: 0.1, “i”: [“cuda -model=unknown”, “topi_nn_conv2d”, [[“TENSOR”, [1, 64, 56, 56], “float32”], [“TENSOR”, [128, 64, 1, 1], “float32”], [2, 2], [0, 0, 0, 0], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 64, 56, 56, “float32”], [128, 64, 1, 1, “float32”], [2, 2], [0, 0, 0, 0], [1, 1], “NCHW”, “float32”], {“e”: [[“tile_f”, “sp”, [-1, 4, 8, 1]], [“tile_y”, “sp”, [-1, 1, 1, 1]], [“tile_x”, “sp”, [-1, 2, 14, 1]], [“tile_rc”, “sp”, [-1, 8]], [“tile_ry”, “sp”, [-1, 1]], [“tile_rx”, “sp”, [-1, 1]], [“auto_unroll_max_step”, “ot”, 1500], [“unroll_explicit”, “ot”, 0]], “c”: null, “t”: “direct”, “i”: 3340823}]}

{“r”: [[3.371866992843581e-05], 0, 3.500437021255493, 1581621338.547618], “v”: 0.1, “i”: [“cuda -model=unknown”, “topi_nn_conv2d”, [[“TENSOR”, [1, 64, 56, 56], “float32”], [“TENSOR”, [128, 64, 3, 3], “float32”], [2, 2], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 64, 56, 56, “float32”], [128, 64, 3, 3, “float32”], [2, 2], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“e”: [[“tile_f”, “sp”, [-1, 1, 16, 4]], [“tile_y”, “sp”, [-1, 1, 2, 1]], [“tile_x”, “sp”, [-1, 1, 7, 2]], [“tile_rc”, “sp”, [-1, 4]], [“tile_ry”, “sp”, [-1, 3]], [“tile_rx”, “sp”, [-1, 3]], [“auto_unroll_max_step”, “ot”, 512], [“unroll_explicit”, “ot”, 1]], “c”: null, “t”: “direct”, “i”: 26036002}]}

{“r”: [[7.36766881134491e-06], 0, 8.708645343780518, 1581623659.4519675], “v”: 0.1, “i”: [“cuda -model=unknown”, “topi_nn_conv2d”, [[“TENSOR”, [1, 64, 56, 56], “float32”], [“TENSOR”, [64, 64, 1, 1], “float32”], [1, 1], [0, 0, 0, 0], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 64, 56, 56, “float32”], [64, 64, 1, 1, “float32”], [1, 1], [0, 0, 0, 0], [1, 1], “NCHW”, “float32”], {“e”: [[“tile_f”, “sp”, [-1, 8, 8, 1]], [“tile_y”, “sp”, [-1, 1, 1, 1]], [“tile_x”, “sp”, [-1, 2, 28, 1]], [“tile_rc”, “sp”, [-1, 8]], [“tile_ry”, “sp”, [-1, 1]], [“tile_rx”, “sp”, [-1, 1]], [“auto_unroll_max_step”, “ot”, 1500], [“unroll_explicit”, “ot”, 1]], “c”: null, “t”: “direct”, “i”: 20616981}]}

{“r”: [[2.785163836772983e-05], 0, 46.4191677570343, 1581626071.820149], “v”: 0.1, “i”: [“cuda -model=unknown”, “topi_nn_conv2d”, [[“TENSOR”, [1, 64, 56, 56], “float32”], [“TENSOR”, [64, 64, 3, 3], “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 64, 56, 56, “float32”], [64, 64, 3, 3, “float32”], [1, 1], [1, 1, 1, 1], [1, 1], “NCHW”, “float32”], {“e”: [[“tile_b”, “sp”, [-1, 1, 1, 1]], [“tile_y”, “sp”, [-1, 2, 8, 2]], [“tile_x”, “sp”, [-1, 7, 28, 1]], [“tile_rc”, “sp”, [-1, 32]], [“auto_unroll_max_step”, “ot”, 128], [“unroll_explicit”, “ot”, 1]], “c”: null, “t”: “winograd”, “i”: 279680}]}

{“r”: [[3.802579433070866e-05], 0, 9.277168035507202, 1581628867.5058868], “v”: 0.1, “i”: [“cuda -model=unknown”, “topi_nn_conv2d”, [[“TENSOR”, [1, 3, 224, 224], “float32”], [“TENSOR”, [64, 3, 7, 7], “float32”], [2, 2], [3, 3, 3, 3], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 3, 224, 224, “float32”], [64, 3, 7, 7, “float32”], [2, 2], [3, 3, 3, 3], [1, 1], “NCHW”, “float32”], {“e”: [[“tile_f”, “sp”, [-1, 2, 8, 4]], [“tile_y”, “sp”, [-1, 8, 1, 1]], [“tile_x”, “sp”, [-1, 1, 14, 1]], [“tile_rc”, “sp”, [-1, 1]], [“tile_ry”, “sp”, [-1, 7]], [“tile_rx”, “sp”, [-1, 7]], [“auto_unroll_max_step”, “ot”, 512], [“unroll_explicit”, “ot”, 0]], “c”: null, “t”: “direct”, “i”: 23438078}]}

If you printed out all extracted tasks before tuning, you can find that ResNet-18 includes 12 tasks (conv2d workloads), so the CUDA log makes sense. Note that the conv2d with the same shapes and attributes will be extracted only once.

I guess the reason you got 3 schedules for a conv2d in LLVM log is because of the graph tuning. Graph tuning may select different schedules for two different conv2d layers even they have the same shapes and attributes, because it considers data layout transform overhead.

cc @kevinthesun

For x86, graph tuning will generate one optimal schedule for each conv2d layer, while for gpu only distinct conv2d workload has schedule, since graph tuning is not required.

Thank you very much for your responses, now I understand the reason behind the differences :slight_smile: