Hello! I’m using AutoTVM to optimize an operator on my CPU (Intel Core i7_7700K), and I get repetitive training results from the outputs like this (partial output):
No: 847 GFLOPS: 131.67/135.80 result: MeasureResult(costs=(0.00046187716083916086,), error_no=0, all_cost=5.4434287548065186, timestamp=1579822808.5451233) [('split_h', [-1, 1, 1, 1]), ('split_w', [-1, 1, 1]), ('split_c', [-1, 1, 1, 128]), ('output_unroll', ['unroll', 'unroll', 'unroll', 'vec', 'unroll', 'none', 'unroll']), ('split_rc', [-1, 16, 4]), ('split_output_local_vec', [-1, 2, 4]), ('output_local_unroll', ['none', 'none', 'none', 'none']), ('split_vec', [-1, 1, 4]), ('reorder', (0, 2, 3, 1))],,None,1171264548000
No: 847 GFLOPS: 131.67/135.80 result: MeasureResult(costs=(0.00046187716083916086,), error_no=0, all_cost=5.4434287548065186, timestamp=1579822808.5451233) [('split_h', [-1, 1, 1, 1]), ('split_w', [-1, 1, 1]), ('split_c', [-1, 1, 1, 128]), ('output_unroll', ['unroll', 'unroll', 'unroll', 'vec', 'unroll', 'none', 'unroll']), ('split_rc', [-1, 16, 4]), ('split_output_local_vec', [-1, 2, 4]), ('output_local_unroll', ['none', 'none', 'none', 'none']), ('split_vec', [-1, 1, 4]), ('reorder', (0, 2, 3, 1))],,None,1171264548000
No: 847 GFLOPS: 131.67/135.80 result: MeasureResult(costs=(0.00046187716083916086,), error_no=0, all_cost=5.4434287548065186, timestamp=1579822808.5451233) [('split_h', [-1, 1, 1, 1]), ('split_w', [-1, 1, 1]), ('split_c', [-1, 1, 1, 128]), ('output_unroll', ['unroll', 'unroll', 'unroll', 'vec', 'unroll', 'none', 'unroll']), ('split_rc', [-1, 16, 4]), ('split_output_local_vec', [-1, 2, 4]), ('output_local_unroll', ['none', 'none', 'none', 'none']), ('split_vec', [-1, 1, 4]), ('reorder', (0, 2, 3, 1))],,None,1171264548000
No: 847 GFLOPS: 131.67/135.80 result: MeasureResult(costs=(0.00046187716083916086,), error_no=0, all_cost=5.4434287548065186, timestamp=1579822808.5451233) [('split_h', [-1, 1, 1, 1]), ('split_w', [-1, 1, 1]), ('split_c', [-1, 1, 1, 128]), ('output_unroll', ['unroll', 'unroll', 'unroll', 'vec', 'unroll', 'none', 'unroll']), ('split_rc', [-1, 16, 4]), ('split_output_local_vec', [-1, 2, 4]), ('output_local_unroll', ['none', 'none', 'none', 'none']), ('split_vec', [-1, 1, 4]), ('reorder', (0, 2, 3, 1))],,None,1171264548000
No: 848 GFLOPS: 131.13/135.80 result: MeasureResult(costs=(0.0004637808035714286,), error_no=0, all_cost=6.061434030532837, timestamp=1579822809.2737823) [('split_h', [-1, 1, 1, 1]), ('split_w', [-1, 1, 1]), ('split_c', [-1, 1, 1, 128]), ('output_unroll', ['none', 'unroll', 'none', 'unroll', 'none', 'vec', 'none']), ('split_rc', [-1, 16, 4]), ('split_output_local_vec', [-1, 2, 4]), ('output_local_unroll', ['none', 'none', 'none', 'none']), ('split_vec', [-1, 1, 4]), ('reorder', (0, 2, 3, 1))],,None,1171221136800
No: 848 GFLOPS: 131.13/135.80 result: MeasureResult(costs=(0.0004637808035714286,), error_no=0, all_cost=6.061434030532837, timestamp=1579822809.2737823) [('split_h', [-1, 1, 1, 1]), ('split_w', [-1, 1, 1]), ('split_c', [-1, 1, 1, 128]), ('output_unroll', ['none', 'unroll', 'none', 'unroll', 'none', 'vec', 'none']), ('split_rc', [-1, 16, 4]), ('split_output_local_vec', [-1, 2, 4]), ('output_local_unroll', ['none', 'none', 'none', 'none']), ('split_vec', [-1, 1, 4]), ('reorder', (0, 2, 3, 1))],,None,1171221136800
No: 848 GFLOPS: 131.13/135.80 result: MeasureResult(costs=(0.0004637808035714286,), error_no=0, all_cost=6.061434030532837, timestamp=1579822809.2737823) [('split_h', [-1, 1, 1, 1]), ('split_w', [-1, 1, 1]), ('split_c', [-1, 1, 1, 128]), ('output_unroll', ['none', 'unroll', 'none', 'unroll', 'none', 'vec', 'none']), ('split_rc', [-1, 16, 4]), ('split_output_local_vec', [-1, 2, 4]), ('output_local_unroll', ['none', 'none', 'none', 'none']), ('split_vec', [-1, 1, 4]), ('reorder', (0, 2, 3, 1))],,None,1171221136800
No: 848 GFLOPS: 131.13/135.80 result: MeasureResult(costs=(0.0004637808035714286,), error_no=0, all_cost=6.061434030532837, timestamp=1579822809.2737823) [('split_h', [-1, 1, 1, 1]), ('split_w', [-1, 1, 1]), ('split_c', [-1, 1, 1, 128]), ('output_unroll', ['none', 'unroll', 'none', 'unroll', 'none', 'vec', 'none']), ('split_rc', [-1, 16, 4]), ('split_output_local_vec', [-1, 2, 4]), ('output_local_unroll', ['none', 'none', 'none', 'none']), ('split_vec', [-1, 1, 4]), ('reorder', (0, 2, 3, 1))],,None,1171221136800
No: 848 GFLOPS: 131.13/135.80 result: MeasureResult(costs=(0.0004637808035714286,), error_no=0, all_cost=6.061434030532837, timestamp=1579822809.2737823) [('split_h', [-1, 1, 1, 1]), ('split_w', [-1, 1, 1]), ('split_c', [-1, 1, 1, 128]), ('output_unroll', ['none', 'unroll', 'none', 'unroll', 'none', 'vec', 'none']), ('split_rc', [-1, 16, 4]), ('split_output_local_vec', [-1, 2, 4]), ('output_local_unroll', ['none', 'none', 'none', 'none']), ('split_vec', [-1, 1, 4]), ('reorder', (0, 2, 3, 1))],,None,1171221136800
No: 848 GFLOPS: 131.13/135.80 result: MeasureResult(costs=(0.0004637808035714286,), error_no=0, all_cost=6.061434030532837, timestamp=1579822809.2737823) [('split_h', [-1, 1, 1, 1]), ('split_w', [-1, 1, 1]), ('split_c', [-1, 1, 1, 128]), ('output_unroll', ['none', 'unroll', 'none', 'unroll', 'none', 'vec', 'none']), ('split_rc', [-1, 16, 4]), ('split_output_local_vec', [-1, 2, 4]), ('output_local_unroll', ['none', 'none', 'none', 'none']), ('split_vec', [-1, 1, 4]), ('reorder', (0, 2, 3, 1))],,None,1171221136800
No: 848 GFLOPS: 131.13/135.80 result: MeasureResult(costs=(0.0004637808035714286,), error_no=0, all_cost=6.061434030532837, timestamp=1579822809.2737823) [('split_h', [-1, 1, 1, 1]), ('split_w', [-1, 1, 1]), ('split_c', [-1, 1, 1, 128]), ('output_unroll', ['none', 'unroll', 'none', 'unroll', 'none', 'vec', 'none']), ('split_rc', [-1, 16, 4]), ('split_output_local_vec', [-1, 2, 4]), ('output_local_unroll', ['none', 'none', 'none', 'none']), ('split_vec', [-1, 1, 4]), ('reorder', (0, 2, 3, 1))],,None,1171221136800
Is it because multiple CPU threads are doing the same evaluation job which means a waste of resources? Any ways to avoid that?
Thanks in advance!