Timing is inconsistent using time_evaluator during auto-tuning process

I’m attempting to auto-tune a convolution op and capture some performance characteristics of each kernel that. Inside tvm/python/tvm/autotvm/tuner/tuner.py:119:tune(), I’ve added some intermediate testing as follows:

self.update(inputs, results)

# target='llvm -mcpu=core-avx2'
ctx = tvm.context(self.task.target.__str__(), 0)
a_tvm = tvm.nd.array(np.random.uniform(size=N,CI,H,W)).astype(np.float32), ctx)
w_tvm = tvm.nd.array(np.random.uniform(size=CO,CI,KH,KW)).astype(np.float32), ctx)
c_tvm = tvm.nd.array(np.zeros((N,CO,H,W), dtype=np.float32), ctx)

for k, (inp, res) in enumerate(zip(inputs, results)):
  print('RPC Results', res.costs)
  sch, args  = self.task.instantiate(inp.config)
  func = tvm.build(sch, args, target=self.task.target)
  evaluator = func.time_evaluator(func.entry_name, ctx=ctx, number=10)
  print('Time Evaluator Results', evaluator(c_tvm, w_tvm, a_tvm).results)

I would expect the results from the time evaluator to be the same as the RPC results, however, I get the following output:

ConfigSpace (len=1024, space_map=
   0 tile_co: Split(policy=factors, product=128, num_outputs=2) len=8
   1 tile_oh: Split(policy=factors, product=56, num_outputs=2) len=8
   2 tile_ow: Split(policy=factors, product=56, num_outputs=2) len=8
   3 reorder_0: Reorder(policy=candidate) len=2
)
 Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/512) | 0.00 sResult time 33.08248257637024
RPC Results [0.380526586, 0.3808520404, 0.3809518632, 0.38147985009999996, 0.3820584419]
Time Evaluator Results (0.9775234639000001,)
 Current/Best:    4.87/   4.87 GFLOPS | Progress: (1/512) | 57.79 sResult time 16.81979012489319
RPC Results [0.0415368888, 0.041549107099999996, 0.0415635531, 0.041600767500000004, 0.042007458600000006]
Time Evaluator Results (0.8076111211000001,)
 Current/Best:   44.55/  44.55 GFLOPS | Progress: (2/512) | 97.80 s
RPC Results [0.0662084582, 0.0662093598, 0.0662773046, 0.0664108298, 0.06643801569999999]
Time Evaluator Results (0.9272723295,)
 Current/Best:   27.99/  44.55 GFLOPS | Progress: (3/512) | 127.54 s
RPC Results [0.0322366298, 0.0322572968, 0.032335031, 0.0323755306, 0.0324979795]
Time Evaluator Results (0.8239123236999999,)
 Current/Best:   57.38/  57.38 GFLOPS | Progress: (4/512) | 155.74 s
RPC Results [0.06701076289999999, 0.0671261213, 0.06714235939999999, 0.0673643525, 0.06761882129999999]
Time Evaluator Results (0.3477886925,)

Is there some reason that the results don’t match? Both are running on the local CPU, so I can’t see why they should be so different. Any help is much appreciated.

Some factors can cause autotvm measurement to be inaccurate. Fix is WIP.