GraphTuner in Intel Graphics - optimizing CNN models on iGPUs


#1

Recently I’ve read a paper on optimizing CNN models on iGPUs(https://arxiv.org/pdf/1907.02154.pdf ).
Models in the paper are optimized using AutoTVM and graph tuner.
That’s why I have a couple of questions:

  1. How could I run intel_graphics target with GraphTuner?
    I found that in x86 example graph tuning is used. I tried to use same method for intel_graphics but it didn’t work.
  2. Which OpenVino version was used to measure the performance?
  3. Which sample application from OpenVino did you use or did you create your own program?

@Laurawly, thank you in advance for an answer.


#2
  1. We put the already tuned version in the schedule which would be triggered using opt_level=3.
  2. Should be Openvino R3 according to the reference in the paper.
  3. We used supported applications from Openvino shown in the release notes: https: //software.intel.com/en-us/articles/OpenVINO-RelNotes.

#3
  1. I copied tune_graph from x86 example into cuda’s example and ran resnet 18 on intel_graphics. During graph tuning I’m receiving following exception:

Traceback (most recent call last):

File “tvm/tutorials/autotvm/tune_relay_cuda.py”, line 303, in
tune_and_evaluate(tuning_option)

File “tvm/tutorials/autotvm/tune_relay_cuda.py”, line 249, in tune_and_evaluate
tune_graph(mod[“main”], input_shape, log_file, graph_opt_sch_file)

File “tvm/tutorials/autotvm/tune_relay_cuda.py”, line 233, in tune_graph
executor.run()

File “/home/ajja/tvm/python/tvm/autotvm/graph_tuner/dynamic_programming_tuner.py”, line 205, in run
self._backward()

File “/home/ajja/tvm/python/tvm/autotvm/graph_tuner/dynamic_programming_tuner.py”, line 108, in _backward
num_states = states_list[0][3].size

IndexError: list index out of range

I investigated the error and noticed that when traversing the graph, every node is removed during creation of a dictionary mapping from op_name nodes to closest input ancestors. I’m not sure what to change to fix it.

  1. Ok, but could you tell which ones? Because, for example, you can run ssd models through object_detection_sample_ssd.