[autoTVM ] why is the line number(13) in final tunning optimal log file not equal to the length of tasks(9)

sqchao · August 3, 2020, 2:53pm

 tasks = autotvm.task.extract_from_program(irmod["main"], target=target,
                                              params=params,
                                              ops=(relay.op.get("nn.conv2d"),))

print(len(tasks)) # 9

def tune_graph(graph, dshape, records, opt_sch_file, target, use_DP=True):
    target_op = [relay.op.get("nn.conv2d"),]
    Tuner = DPTuner if use_DP else PBQPTuner
    executor = Tuner(graph, {input_tensor: dshape}, records, target_op, target)
    executor.benchmark_layout_transform(min_exec_num=2000)
    executor.run()
    executor.write_opt_sch2record_file(opt_sch_file)

the line number of opt_sch_file : 13 lines

wihy is the length of opt_sch_file not equal to the size of tasks.

thanks in advance.

comaniac · August 6, 2020, 1:33am

L9 indicates that your model has 9 unique conv2d workloads. The conv2d with the same shape and attributes will be one task, because they only need to be tuned once.

On the other hand, graph tuner selects the best data layout for each op. It means that even if you have two conv2ds with the same shape, graph tuner might select different data layout for each of them with the consideration of data layout transform overheads with their previous and next ops.

In summary, we can guess your model have 13 conv2d ops, and 4 of them are identical in terms of shapes and attributes.

sqchao · August 4, 2020, 2:51am

I see, thanks. By the way, when comparing the performance between tvm and tensorflow, I find that tensorflow have better performance than tvm even this tuning speed up by 20%. What adjustments do I need to make? thanks

the model include 13 conv2Ds and 3 denses.

the run_script is similar as tutorials/autotvm/tune_relay_x86

comaniac · August 4, 2020, 4:35pm

Did you add the right attributes to the target string? Something like llvm -mcpu=core-avx2.

sqchao · August 6, 2020, 12:56am

I have changed the target string from “llvm” to “llvm -mcpu=cascadelake” and get a perfect performance result. thank you very mach.