I’m attempting to auto-tune a convolution op and capture some performance characteristics of each kernel that. Inside tvm/python/tvm/autotvm/tuner/tuner.py:119:tune()
, I’ve added some intermediate testing as follows:
self.update(inputs, results)
# target='llvm -mcpu=core-avx2'
ctx = tvm.context(self.task.target.__str__(), 0)
a_tvm = tvm.nd.array(np.random.uniform(size=N,CI,H,W)).astype(np.float32), ctx)
w_tvm = tvm.nd.array(np.random.uniform(size=CO,CI,KH,KW)).astype(np.float32), ctx)
c_tvm = tvm.nd.array(np.zeros((N,CO,H,W), dtype=np.float32), ctx)
for k, (inp, res) in enumerate(zip(inputs, results)):
print('RPC Results', res.costs)
sch, args = self.task.instantiate(inp.config)
func = tvm.build(sch, args, target=self.task.target)
evaluator = func.time_evaluator(func.entry_name, ctx=ctx, number=10)
print('Time Evaluator Results', evaluator(c_tvm, w_tvm, a_tvm).results)
I would expect the results from the time evaluator to be the same as the RPC results, however, I get the following output:
ConfigSpace (len=1024, space_map=
0 tile_co: Split(policy=factors, product=128, num_outputs=2) len=8
1 tile_oh: Split(policy=factors, product=56, num_outputs=2) len=8
2 tile_ow: Split(policy=factors, product=56, num_outputs=2) len=8
3 reorder_0: Reorder(policy=candidate) len=2
)
Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/512) | 0.00 sResult time 33.08248257637024
RPC Results [0.380526586, 0.3808520404, 0.3809518632, 0.38147985009999996, 0.3820584419]
Time Evaluator Results (0.9775234639000001,)
Current/Best: 4.87/ 4.87 GFLOPS | Progress: (1/512) | 57.79 sResult time 16.81979012489319
RPC Results [0.0415368888, 0.041549107099999996, 0.0415635531, 0.041600767500000004, 0.042007458600000006]
Time Evaluator Results (0.8076111211000001,)
Current/Best: 44.55/ 44.55 GFLOPS | Progress: (2/512) | 97.80 s
RPC Results [0.0662084582, 0.0662093598, 0.0662773046, 0.0664108298, 0.06643801569999999]
Time Evaluator Results (0.9272723295,)
Current/Best: 27.99/ 44.55 GFLOPS | Progress: (3/512) | 127.54 s
RPC Results [0.0322366298, 0.0322572968, 0.032335031, 0.0323755306, 0.0324979795]
Time Evaluator Results (0.8239123236999999,)
Current/Best: 57.38/ 57.38 GFLOPS | Progress: (4/512) | 155.74 s
RPC Results [0.06701076289999999, 0.0671261213, 0.06714235939999999, 0.0673643525, 0.06761882129999999]
Time Evaluator Results (0.3477886925,)
Is there some reason that the results don’t match? Both are running on the local CPU, so I can’t see why they should be so different. Any help is much appreciated.