Auto-tuning for Atom CPU gives worse results than untuned model

I followed x86 Auto-tuning tutorial to tune tensorflow mobilenet_v1_1.0_224_frozen.pb model for AtomicPi (CPU: Atom x5-Z8350)

The spreadsheet below shows debug evaluation times for each operations for

  • UNTUNED model
  • HISTORY_BEST log
  • GRAPH_OPT log

spreadsheet tune mobilenet_v1_1.0_224_frozen.pb for Atomicpi

For some reasons UNTUNED model shows about 40% faster overall result.
I also noticed that tuned and untuned operators shapes are different

I highlighted records with significant Time differences.
Red - when tuned shows worse time
Green - tuned shows better time

I recommend to use View-Zoom 50-75% on laptop screen

Autotune log files are on dropbox

   14773 Jul 30 05:24 mobilenet_v1_1.0_224_frozen.pb_graph_opt.log
 5589213 Jul 30 05:24 mobilenet_v1_1.0_224_frozen.pb.log
   10560 Jul 31 21:42 mobilenet_v1_1.0_224_frozen.pb.pick_best.log

I used the following tuning options

tuning_option = {                                                                                                                                                                    
    'log_filename': log_file,                                                                                                                                                        
    'tuner': 'random',                                                                                                                                                               
    'early_stopping': None,                                                                                                                                                          
                                                                                                                                                                                     
    'measure_option': autotvm.measure_option(                                                                                                                                        
        builder=autotvm.LocalBuilder(),                                                                                                                                                                                                                                                    
        runner=autotvm.RPCRunner(                                                                                                                                                    
            device_key, host='localhost', port=tr_port,                                                                                                                              
                number=5,                                                                                                                                                            
                timeout=10,                                                                                                                                                          
        ),                                                                                                                                                                           
    ),                                                                                                                                                                               
} 

I rerun autotune with the following parameters.
Tuned model still shows worse time than untuned one

  • Untuned: 104 ms
  • Tuned: 112 ms (7.7% slower)
tuning_option = {
    'log_filename': log_file,
    'tuner': 'gridsearch',
    'early_stopping': None,
    
    'measure_option': autotvm.measure_option(
        builder=autotvm.LocalBuilder(),
        runner=autotvm.RPCRunner(
            device_key, host='localhost', port=tr_port,
                min_repeat_ms=2000,
                number=10,
                repeat=1,
        ),
    ),
}

Debug Evaluation spreadsheet