Are Intel Graphics really supported in TVM?

Hello all, I have been trying in vein to get a single example run with intel graphics, both under windows and linux. The last time I tried in an ubuntu 18.04, (windows 10 WSL actually) I encountered these errors :

This is the snippet taken from here :


#%% [markdown]
# in  the name of Allah the most compassionate the most merciful
# To add a new cell, type '# %%'
# To add a new markdown cell, type '# %% [markdown]'
# %%
# from IPython import get_ipython

# %%
# get_ipython().run_line_magic('matplotlib', 'inline')

# %% [markdown]
# 
# 
# Quick Start Tutorial for Compiling Deep Learning Models
# ======================================================
# **Author**: `Yao Wang <https://github.com/kevinthesun>`_, `Truman Tian <https://github.com/SiNZeRo>`_
# 
# This example shows how to build a neural network with Relay python frontend and
# generates a runtime library for Nvidia GPU with TVM.
# Notice that you need to build TVM with cuda and llvm enabled.
# 
# 
# %% [markdown]
# Overview for Supported Hardware Backend of TVM
# ----------------------------------------------
# The image below shows hardware backend currently supported by TVM:
# 
# ![](https://github.com/dmlc/web-data/raw/master/tvm/tutorial/tvm_support_list.png)
# 
#      :align: center
#      :scale: 100%
# 
# In this tutorial, we'll choose cuda and llvm as target backends.
# To begin with, let's import Relay and TVM.
# 
# 

# %%
import numpy as np

from tvm import relay
from tvm.relay import testing
import tvm
from tvm.contrib import graph_runtime

# %% [markdown]
# Define Neural Network in Relay
# -----------------------------
# First, let's define a neural network with relay python frontend.
# For simplicity, we'll use pre-defined resnet-18 network in Relay.
# Parameters are initialized with Xavier initializer.
# Relay also supports other model formats such as MXNet, CoreML, ONNX and
# Tensorflow.
# 
# In this tutorial, we assume we will do inference on our device
# and the batch size is set to be 1. Input images are RGB color
# images of size 224 * 224. We can call the :any:`tvm.relay.expr.astext()`
# to show the network structure.
# 
# 

# %%
batch_size = 1
num_class = 1000
image_shape = (3, 224, 224)
data_shape = (batch_size,) + image_shape
out_shape = (batch_size, num_class)

mod, params = relay.testing.resnet.get_workload(
    num_layers=18, batch_size=batch_size, image_shape=image_shape)

# set show_meta_data=True if you want to show meta data
print(mod.astext(show_meta_data=False))

# %% [markdown]
# Compilation
# -----------
# Next step is to compile the model using the Relay/TVM pipeline.
# Users can specify the optimization level of the compilation.
# Currently this value can be 0 to 3. The optimization passes include
# operator fusion, pre-computation, layout transformation and so on.
# 
# :any:`relay.build_module.build` returns three components: the execution graph in
# json format, the TVM module library of compiled functions specifically
# for this graph on the target hardware, and the parameter blobs of
# the model. During the compilation, Relay does the graph-level
# optimization while TVM does the tensor-level optimization, resulting
# in an optimized runtime module for model serving.
# 
# We'll first compile for Nvidia GPU. Behind the scene, `relay.build_module.build`
# first does a number of graph-level optimizations, e.g. pruning, fusing, etc.,
# then registers the operators (i.e. the nodes of the optimized graphs) to
# TVM implementations to generate a `tvm.module`.
# To generate the module library, TVM will first transfer the high level IR
# into the lower intrinsic IR of the specified target backend, which is CUDA
# in this example. Then the machine code will be generated as the module library.
# 
# 

# %%

opt_level = 3
target = tvm.target.intel_graphics() # model='Intel(R) Iris(R) Pro Graphics 580'
print(f'opt_level :{opt_level}')
print(f'target: {target}')
with relay.build_config(opt_level=opt_level):
    graph, lib, params = relay.build_module.build(mod,
                                                  target,
                                                  params=params)

# %% [markdown]
# Run the generate library
# ------------------------
# Now we can create graph runtime and run the module on Nvidia GPU.
# 
# 

# %%
# create random input
ctx = tvm.opencl() # for cuda this is gpu()
data = np.random.uniform(-1, 1, size=data_shape).astype("float32")
# create module
module = graph_runtime.create(graph, lib, ctx)
# set input and parameters
module.set_input("data", data)
module.set_input(**params)
# run
module.run()
# get output
out = module.get_output(0, tvm.nd.empty(out_shape)).asnumpy()

# Print first 10 elements of output
print(out.flatten()[0:10])

# %% [markdown]
# Save and Load Compiled Module
# -----------------------------
# We can also save the graph, lib and parameters into files and load them
# back in deploy environment.
# 
# 

# %%
# save the graph, lib and params into separate files
from tvm.contrib import util

temp = util.tempdir()
path_lib = temp.relpath("deploy_lib.tar")
lib.export_library(path_lib)
with open(temp.relpath("deploy_graph.json"), "w") as fo:
    fo.write(graph)
with open(temp.relpath("deploy_param.params"), "wb") as fo:
    fo.write(relay.save_param_dict(params))
print(temp.listdir())


# %%
# load the module back.
loaded_json = open(temp.relpath("deploy_graph.json")).read()
loaded_lib = tvm.module.load(path_lib)
loaded_params = bytearray(open(temp.relpath("deploy_param.params"), "rb").read())
input_data = tvm.nd.array(np.random.uniform(size=data_shape).astype("float32"))

module = graph_runtime.create(loaded_json, loaded_lib, ctx)
module.load_params(loaded_params)
module.run(data=input_data)
out_deploy = module.get_output(0).asnumpy()

# Print first 10 elements of output
print(out_deploy.flatten()[0:10])

# check whether the output from deployed module is consistent with original one
tvm.testing.assert_allclose(out_deploy, out, atol=1e-3)


# %%

and these are the errors :

(base) user@F3S-s3user:/mnt/d/Codes/tvm_testbed$ env PTVSD_LAUNCHER_PORT=49216 /home/user/anaconda3/bin/python /home/user/.vscode-server/extensions/ms-python.python-2020.1.58038/pythonFiles/lib/python/new_ptvsd/wheels/ptvsd/launcher /mnt/d/Codes/tvm_testbed/tvm_test1.py 
v0.0.4
def @main(%data: Tensor[(1, 3, 224, 224), float32], %bn_data_gamma: Tensor[(3), float32], %bn_data_beta: Tensor[(3), float32], %bn_data_moving_mean: Tensor[(3), float32], %bn_data_moving_var: Tensor[(3), float32], %conv0_weight: Tensor[(64, 3, 7, 7), float32], %bn0_gamma: Tensor[(64), float32], %bn0_beta: Tensor[(64), float32], %bn0_moving_mean: Tensor[(64), float32], %bn0_moving_var: Tensor[(64), float32], %stage1_unit1_bn1_gamma: Tensor[(64), float32], %stage1_unit1_bn1_beta: Tensor[(64), float32], %stage1_unit1_bn1_moving_mean: Tensor[(64), float32], %stage1_unit1_bn1_moving_var: Tensor[(64), float32], %stage1_unit1_conv1_weight: Tensor[(64, 64, 3, 3), float32], %stage1_unit1_bn2_gamma: Tensor[(64), float32], %stage1_unit1_bn2_beta: Tensor[(64), float32], %stage1_unit1_bn2_moving_mean: Tensor[(64), float32], %stage1_unit1_bn2_moving_var: Tensor[(64), float32], %stage1_unit1_conv2_weight: Tensor[(64, 64, 3, 3), float32], %stage1_unit1_sc_weight: Tensor[(64, 64, 1, 1), float32], %stage1_unit2_bn1_gamma: Tensor[(64), float32], %stage1_unit2_bn1_beta: Tensor[(64), float32], %stage1_unit2_bn1_moving_mean: Tensor[(64), float32], %stage1_unit2_bn1_moving_var: Tensor[(64), float32], %stage1_unit2_conv1_weight: Tensor[(64, 64, 3, 3), float32], %stage1_unit2_bn2_gamma: Tensor[(64), float32], %stage1_unit2_bn2_beta: Tensor[(64), float32], %stage1_unit2_bn2_moving_mean: Tensor[(64), float32], %stage1_unit2_bn2_moving_var: Tensor[(64), float32], %stage1_unit2_conv2_weight: Tensor[(64, 64, 3, 3), float32], %stage2_unit1_bn1_gamma: Tensor[(64), float32], %stage2_unit1_bn1_beta: Tensor[(64), float32], %stage2_unit1_bn1_moving_mean: Tensor[(64), float32], %stage2_unit1_bn1_moving_var: Tensor[(64), float32], %stage2_unit1_conv1_weight: Tensor[(128, 64, 3, 3), float32], %stage2_unit1_bn2_gamma: Tensor[(128), float32], %stage2_unit1_bn2_beta: Tensor[(128), float32], %stage2_unit1_bn2_moving_mean: Tensor[(128), float32], %stage2_unit1_bn2_moving_var: Tensor[(128), float32], %stage2_unit1_conv2_weight: Tensor[(128, 128, 3, 3), float32], %stage2_unit1_sc_weight: Tensor[(128, 64, 1, 1), float32], %stage2_unit2_bn1_gamma: Tensor[(128), float32], %stage2_unit2_bn1_beta: Tensor[(128), float32], %stage2_unit2_bn1_moving_mean: Tensor[(128), float32], %stage2_unit2_bn1_moving_var: Tensor[(128), float32], %stage2_unit2_conv1_weight: Tensor[(128, 128, 3, 3), float32], %stage2_unit2_bn2_gamma: Tensor[(128), float32], %stage2_unit2_bn2_beta: Tensor[(128), float32], %stage2_unit2_bn2_moving_mean: Tensor[(128), float32], %stage2_unit2_bn2_moving_var: Tensor[(128), float32], %stage2_unit2_conv2_weight: Tensor[(128, 128, 3, 3), float32], %stage3_unit1_bn1_gamma: Tensor[(128), float32], %stage3_unit1_bn1_beta: Tensor[(128), float32], %stage3_unit1_bn1_moving_mean: Tensor[(128), float32], %stage3_unit1_bn1_moving_var: Tensor[(128), float32], %stage3_unit1_conv1_weight: Tensor[(256, 128, 3, 3), float32], %stage3_unit1_bn2_gamma: Tensor[(256), float32], %stage3_unit1_bn2_beta: Tensor[(256), float32], %stage3_unit1_bn2_moving_mean: Tensor[(256), float32], %stage3_unit1_bn2_moving_var: Tensor[(256), float32], %stage3_unit1_conv2_weight: Tensor[(256, 256, 3, 3), float32], %stage3_unit1_sc_weight: Tensor[(256, 128, 1, 1), float32], %stage3_unit2_bn1_gamma: Tensor[(256), float32], %stage3_unit2_bn1_beta: Tensor[(256), float32], %stage3_unit2_bn1_moving_mean: Tensor[(256), float32], %stage3_unit2_bn1_moving_var: Tensor[(256), float32], %stage3_unit2_conv1_weight: Tensor[(256, 256, 3, 3), float32], %stage3_unit2_bn2_gamma: Tensor[(256), float32], %stage3_unit2_bn2_beta: Tensor[(256), float32], %stage3_unit2_bn2_moving_mean: Tensor[(256), float32], %stage3_unit2_bn2_moving_var: Tensor[(256), float32], %stage3_unit2_conv2_weight: Tensor[(256, 256, 3, 3), float32], %stage4_unit1_bn1_gamma: Tensor[(256), float32], %stage4_unit1_bn1_beta: Tensor[(256), float32], %stage4_unit1_bn1_moving_mean: Tensor[(256), float32], %stage4_unit1_bn1_moving_var: Tensor[(256), float32], %stage4_unit1_conv1_weight: Tensor[(512, 256, 3, 3), float32], %stage4_unit1_bn2_gamma: Tensor[(512), float32], %stage4_unit1_bn2_beta: Tensor[(512), float32], %stage4_unit1_bn2_moving_mean: Tensor[(512), float32], %stage4_unit1_bn2_moving_var: Tensor[(512), float32], %stage4_unit1_conv2_weight: Tensor[(512, 512, 3, 3), float32], %stage4_unit1_sc_weight: Tensor[(512, 256, 1, 1), float32], %stage4_unit2_bn1_gamma: Tensor[(512), float32], %stage4_unit2_bn1_beta: Tensor[(512), float32], %stage4_unit2_bn1_moving_mean: Tensor[(512), float32], %stage4_unit2_bn1_moving_var: Tensor[(512), float32], %stage4_unit2_conv1_weight: Tensor[(512, 512, 3, 3), float32], %stage4_unit2_bn2_gamma: Tensor[(512), float32], %stage4_unit2_bn2_beta: Tensor[(512), float32], %stage4_unit2_bn2_moving_mean: Tensor[(512), float32], %stage4_unit2_bn2_moving_var: Tensor[(512), float32], %stage4_unit2_conv2_weight: Tensor[(512, 512, 3, 3), float32], %bn1_gamma: Tensor[(512), float32], %bn1_beta: Tensor[(512), float32], %bn1_moving_mean: Tensor[(512), float32], %bn1_moving_var: Tensor[(512), float32], %fc1_weight: Tensor[(1000, 512), float32], %fc1_bias: Tensor[(1000), float32]) -> Tensor[(1, 1000), float32] {
  %0 = nn.batch_norm(%data, %bn_data_gamma, %bn_data_beta, %bn_data_moving_mean, %bn_data_moving_var, epsilon=2e-05f, scale=False) /* ty=(Tensor[(1, 3, 224, 224), float32], Tensor[(3), float32], Tensor[(3), float32]) */;
  %1 = %0.0;
  %2 = nn.conv2d(%1, %conv0_weight, strides=[2, 2], padding=[3, 3, 3, 3], channels=64, kernel_size=[7, 7]) /* ty=Tensor[(1, 64, 112, 112), float32] */;
  %3 = nn.batch_norm(%2, %bn0_gamma, %bn0_beta, %bn0_moving_mean, %bn0_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 112, 112), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %4 = %3.0;
  %5 = nn.relu(%4) /* ty=Tensor[(1, 64, 112, 112), float32] */;
  %6 = nn.max_pool2d(%5, pool_size=[3, 3], strides=[2, 2], padding=[1, 1]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %7 = nn.batch_norm(%6, %stage1_unit1_bn1_gamma, %stage1_unit1_bn1_beta, %stage1_unit1_bn1_moving_mean, %stage1_unit1_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %8 = %7.0;
  %9 = nn.relu(%8) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %10 = nn.conv2d(%9, %stage1_unit1_conv1_weight, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %11 = nn.batch_norm(%10, %stage1_unit1_bn2_gamma, %stage1_unit1_bn2_beta, %stage1_unit1_bn2_moving_mean, %stage1_unit1_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %12 = %11.0;
  %13 = nn.relu(%12) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %14 = nn.conv2d(%13, %stage1_unit1_conv2_weight, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %15 = nn.conv2d(%9, %stage1_unit1_sc_weight, padding=[0, 0, 0, 0], channels=64, kernel_size=[1, 1]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %16 = add(%14, %15) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %17 = nn.batch_norm(%16, %stage1_unit2_bn1_gamma, %stage1_unit2_bn1_beta, %stage1_unit2_bn1_moving_mean, %stage1_unit2_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %18 = %17.0;
  %19 = nn.relu(%18) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %20 = nn.conv2d(%19, %stage1_unit2_conv1_weight, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %21 = nn.batch_norm(%20, %stage1_unit2_bn2_gamma, %stage1_unit2_bn2_beta, %stage1_unit2_bn2_moving_mean, %stage1_unit2_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %22 = %21.0;
  %23 = nn.relu(%22) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %24 = nn.conv2d(%23, %stage1_unit2_conv2_weight, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %25 = add(%24, %16) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %26 = nn.batch_norm(%25, %stage2_unit1_bn1_gamma, %stage2_unit1_bn1_beta, %stage2_unit1_bn1_moving_mean, %stage2_unit1_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %27 = %26.0;
  %28 = nn.relu(%27) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %29 = nn.conv2d(%28, %stage2_unit1_conv1_weight, strides=[2, 2], padding=[1, 1, 1, 1], channels=128, kernel_size=[3, 3]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %30 = nn.batch_norm(%29, %stage2_unit1_bn2_gamma, %stage2_unit1_bn2_beta, %stage2_unit1_bn2_moving_mean, %stage2_unit1_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 128, 28, 28), float32], Tensor[(128), float32], Tensor[(128), float32]) */;
  %31 = %30.0;
  %32 = nn.relu(%31) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %33 = nn.conv2d(%32, %stage2_unit1_conv2_weight, padding=[1, 1, 1, 1], channels=128, kernel_size=[3, 3]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %34 = nn.conv2d(%28, %stage2_unit1_sc_weight, strides=[2, 2], padding=[0, 0, 0, 0], channels=128, kernel_size=[1, 1]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %35 = add(%33, %34) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %36 = nn.batch_norm(%35, %stage2_unit2_bn1_gamma, %stage2_unit2_bn1_beta, %stage2_unit2_bn1_moving_mean, %stage2_unit2_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 128, 28, 28), float32], Tensor[(128), float32], Tensor[(128), float32]) */;
  %37 = %36.0;
  %38 = nn.relu(%37) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %39 = nn.conv2d(%38, %stage2_unit2_conv1_weight, padding=[1, 1, 1, 1], channels=128, kernel_size=[3, 3]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %40 = nn.batch_norm(%39, %stage2_unit2_bn2_gamma, %stage2_unit2_bn2_beta, %stage2_unit2_bn2_moving_mean, %stage2_unit2_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 128, 28, 28), float32], Tensor[(128), float32], Tensor[(128), float32]) */;
  %41 = %40.0;
  %42 = nn.relu(%41) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %43 = nn.conv2d(%42, %stage2_unit2_conv2_weight, padding=[1, 1, 1, 1], channels=128, kernel_size=[3, 3]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %44 = add(%43, %35) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %45 = nn.batch_norm(%44, %stage3_unit1_bn1_gamma, %stage3_unit1_bn1_beta, %stage3_unit1_bn1_moving_mean, %stage3_unit1_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 128, 28, 28), float32], Tensor[(128), float32], Tensor[(128), float32]) */;
  %46 = %45.0;
  %47 = nn.relu(%46) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %48 = nn.conv2d(%47, %stage3_unit1_conv1_weight, strides=[2, 2], padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %49 = nn.batch_norm(%48, %stage3_unit1_bn2_gamma, %stage3_unit1_bn2_beta, %stage3_unit1_bn2_moving_mean, %stage3_unit1_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 256, 14, 14), float32], Tensor[(256), float32], Tensor[(256), float32]) */;
  %50 = %49.0;
  %51 = nn.relu(%50) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %52 = nn.conv2d(%51, %stage3_unit1_conv2_weight, padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %53 = nn.conv2d(%47, %stage3_unit1_sc_weight, strides=[2, 2], padding=[0, 0, 0, 0], channels=256, kernel_size=[1, 1]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %54 = add(%52, %53) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %55 = nn.batch_norm(%54, %stage3_unit2_bn1_gamma, %stage3_unit2_bn1_beta, %stage3_unit2_bn1_moving_mean, %stage3_unit2_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 256, 14, 14), float32], Tensor[(256), float32], Tensor[(256), float32]) */;
  %56 = %55.0;
  %57 = nn.relu(%56) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %58 = nn.conv2d(%57, %stage3_unit2_conv1_weight, padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %59 = nn.batch_norm(%58, %stage3_unit2_bn2_gamma, %stage3_unit2_bn2_beta, %stage3_unit2_bn2_moving_mean, %stage3_unit2_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 256, 14, 14), float32], Tensor[(256), float32], Tensor[(256), float32]) */;
  %60 = %59.0;
  %61 = nn.relu(%60) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %62 = nn.conv2d(%61, %stage3_unit2_conv2_weight, padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %63 = add(%62, %54) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %64 = nn.batch_norm(%63, %stage4_unit1_bn1_gamma, %stage4_unit1_bn1_beta, %stage4_unit1_bn1_moving_mean, %stage4_unit1_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 256, 14, 14), float32], Tensor[(256), float32], Tensor[(256), float32]) */;
  %65 = %64.0;
  %66 = nn.relu(%65) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %67 = nn.conv2d(%66, %stage4_unit1_conv1_weight, strides=[2, 2], padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %68 = nn.batch_norm(%67, %stage4_unit1_bn2_gamma, %stage4_unit1_bn2_beta, %stage4_unit1_bn2_moving_mean, %stage4_unit1_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 512, 7, 7), float32], Tensor[(512), float32], Tensor[(512), float32]) */;
  %69 = %68.0;
  %70 = nn.relu(%69) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %71 = nn.conv2d(%70, %stage4_unit1_conv2_weight, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %72 = nn.conv2d(%66, %stage4_unit1_sc_weight, strides=[2, 2], padding=[0, 0, 0, 0], channels=512, kernel_size=[1, 1]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %73 = add(%71, %72) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %74 = nn.batch_norm(%73, %stage4_unit2_bn1_gamma, %stage4_unit2_bn1_beta, %stage4_unit2_bn1_moving_mean, %stage4_unit2_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 512, 7, 7), float32], Tensor[(512), float32], Tensor[(512), float32]) */;
  %75 = %74.0;
  %76 = nn.relu(%75) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %77 = nn.conv2d(%76, %stage4_unit2_conv1_weight, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %78 = nn.batch_norm(%77, %stage4_unit2_bn2_gamma, %stage4_unit2_bn2_beta, %stage4_unit2_bn2_moving_mean, %stage4_unit2_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 512, 7, 7), float32], Tensor[(512), float32], Tensor[(512), float32]) */;
  %79 = %78.0;
  %80 = nn.relu(%79) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %81 = nn.conv2d(%80, %stage4_unit2_conv2_weight, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %82 = add(%81, %73) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %83 = nn.batch_norm(%82, %bn1_gamma, %bn1_beta, %bn1_moving_mean, %bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 512, 7, 7), float32], Tensor[(512), float32], Tensor[(512), float32]) */;
  %84 = %83.0;
  %85 = nn.relu(%84) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %86 = nn.global_avg_pool2d(%85) /* ty=Tensor[(1, 512, 1, 1), float32] */;
  %87 = nn.batch_flatten(%86) /* ty=Tensor[(1, 512), float32] */;
  %88 = nn.dense(%87, %fc1_weight, units=1000) /* ty=Tensor[(1, 1000), float32] */;
  %89 = nn.bias_add(%88, %fc1_bias, axis=-1) /* ty=Tensor[(1, 1000), float32] */;
  nn.softmax(%89) /* ty=Tensor[(1, 1000), float32] */
}

opt_level :3
target: opencl -device=intel_graphics -model=unknown
...100%, 0.04 MB, 27 KB/s, 1 seconds passed
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 3, 224, 224, 'float32'), (64, 3, 7, 7, 'float32'), (2, 2), (3, 3, 3, 3), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Traceback (most recent call last):

  File "/mnt/d/Codes/tvm_testbed/tvm_test1.py", line 109, in <module>
    params=params)

  File "/home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/relay/build_module.py", line 248, in build
    graph_json, mod, params = bld_mod.build(func, target, target_host, params)

  File "/home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/relay/build_module.py", line 118, in build
    self._build(func, target, target_host)

  File "tvm/_ffi/_cython/./packed_func.pxi", line 308, in tvm._ffi._cy3.core.PackedFuncBase.__call__

  File "tvm/_ffi/_cython/./packed_func.pxi", line 243, in tvm._ffi._cy3.core.FuncCall

  File "tvm/_ffi/_cython/./packed_func.pxi", line 232, in tvm._ffi._cy3.core.FuncCall3

  File "tvm/_ffi/_cython/./base.pxi", line 159, in tvm._ffi._cy3.core.CALL

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::ForwardRewriter::VisitExpr_(tvm::relay::CallNode const*)+0x6ca) [0x7f339e2a6bba]
  [bt] (7) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::ExprMutator::VisitExpr(tvm::RelayExpr const&)+0x9e) [0x7f339e3fbbce]
  [bt] (6) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)+0x9a) [0x7f339e2a2e7a]
  [bt] (5) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>::InitVTable()::{lambda(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>*)#6}::_FUN(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>*)+0x27) [0x7f339e2a0ec7]
  [bt] (4) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::ForwardRewriter::VisitExpr_(tvm::relay::CallNode const*)+0xa05) [0x7f339e2a6ef5]
  [bt] (3) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::TypedPackedFunc<tvm::RelayExpr (tvm::relay::Call const&, tvm::Array<tvm::RelayExpr, void> const&, tvm::runtime::ObjectRef const&)>::AssignTypedLambda<tvm::RelayExpr (*)(tvm::relay::Call const&, tvm::Array<tvm::RelayExpr, void> const&, tvm::runtime::ObjectRef const&)>(tvm::RelayExpr (*)(tvm::relay::Call const&, tvm::Array<tvm::RelayExpr, void> const&, tvm::runtime::ObjectRef const&))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0xb0) [0x7f339e2436b0]
  [bt] (2) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::RelayExpr tvm::relay::LayoutRewriter<tvm::relay::alter_op_layout::AlterTransformMemorizer>(tvm::relay::Call const&, tvm::Array<tvm::RelayExpr, void> const&, tvm::runtime::ObjectRef const&)+0xbba) [0x7f339e24155a]
  [bt] (1) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::alter_op_layout::AlterTransformMemorizer::CallWithNewLayouts(tvm::relay::Call const&, std::vector<tvm::RelayExpr, std::allocator<tvm::RelayExpr> > const&)+0x5cb) [0x7f339e23fbbb]
  [bt] (0) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(+0x134815b) [0x7f339e4c015b]
  File "tvm/_ffi/_cython/./packed_func.pxi", line 54, in tvm._ffi._cy3.core.tvm_callback
  File "/home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/relay/op/nn/_nn.py", line 270, in alter_op_layout_conv2d
    return topi.nn.conv2d_alter_layout(attrs, inputs, tinfos, op)
  File "</home/user/anaconda3/lib/python3.7/site-packages/decorator.py:decorator-gen-37>", line 2, in conv2d_alter_layout
  File "/home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/target.py", line 381, in dispatch_func
    return dispatch_dict[k](*args, **kwargs)
  File "/home/user/.local/lib/python3.7/site-packages/topi-0.7.dev0-py3.7.egg/topi/intel_graphics/conv2d.py", line 238, in _alter_conv2d_layout
    ic_bn = cfg["tile_ic"].val if hasattr(cfg["tile_ic"], "val") else cfg["tile_ic"].size[-1]
  File "/home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/autotvm/task/space.py", line 808, in __getitem__
    return self._entity_map[name]
KeyError: 'tile_ic'


if I’m not mistaken, this is along with the findings of this thread, but I’m not sure as the last post in that thread is several months old without a proper answer!

I’d greatly appreciate if someone could shed some light on this. Thank you all very much in advance

It might be a bug in the TOPI schedule. The fallback schedule will not have the entity tile_ic if out_channel % 16 == 0 (code). Could you check your model to verify this hypothesis?

cc @Laurawly @kevinthesun

This is caused by a recent commit: https://github.com/apache/incubator-tvm/commit/b528acc143dd8a09b322ba0845743e18ae206e22#diff-f42aee9848a02ce033e17e390757f49bL48. I’ll reverse the intel_graphics/conv2d.py changes and it should fix the tile_ic key error.

2 Likes

PR #4853 merged.

1 Like

Thanks a lot. that error is gone, however, I get new errors : Is this also a bug ?

v0.0.4
def @main(%data: Tensor[(1, 3, 224, 224), float32], %bn_data_gamma: Tensor[(3), float32], %bn_data_beta: Tensor[(3), float32], %bn_data_moving_mean: Tensor[(3), float32], %bn_data_moving_var: Tensor[(3), float32], %conv0_weight: Tensor[(64, 3, 7, 7), float32], %bn0_gamma: Tensor[(64), float32], %bn0_beta: Tensor[(64), float32], %bn0_moving_mean: Tensor[(64), float32], %bn0_moving_var: Tensor[(64), float32], %stage1_unit1_bn1_gamma: Tensor[(64), float32], %stage1_unit1_bn1_beta: Tensor[(64), float32], %stage1_unit1_bn1_moving_mean: Tensor[(64), float32], %stage1_unit1_bn1_moving_var: Tensor[(64), float32], %stage1_unit1_conv1_weight: Tensor[(64, 64, 3, 3), float32], %stage1_unit1_bn2_gamma: Tensor[(64), float32], %stage1_unit1_bn2_beta: Tensor[(64), float32], %stage1_unit1_bn2_moving_mean: Tensor[(64), float32], %stage1_unit1_bn2_moving_var: Tensor[(64), float32], %stage1_unit1_conv2_weight: Tensor[(64, 64, 3, 3), float32], %stage1_unit1_sc_weight: Tensor[(64, 64, 1, 1), float32], %stage1_unit2_bn1_gamma: Tensor[(64), float32], %stage1_unit2_bn1_beta: Tensor[(64), float32], %stage1_unit2_bn1_moving_mean: Tensor[(64), float32], %stage1_unit2_bn1_moving_var: Tensor[(64), float32], %stage1_unit2_conv1_weight: Tensor[(64, 64, 3, 3), float32], %stage1_unit2_bn2_gamma: Tensor[(64), float32], %stage1_unit2_bn2_beta: Tensor[(64), float32], %stage1_unit2_bn2_moving_mean: Tensor[(64), float32], %stage1_unit2_bn2_moving_var: Tensor[(64), float32], %stage1_unit2_conv2_weight: Tensor[(64, 64, 3, 3), float32], %stage2_unit1_bn1_gamma: Tensor[(64), float32], %stage2_unit1_bn1_beta: Tensor[(64), float32], %stage2_unit1_bn1_moving_mean: Tensor[(64), float32], %stage2_unit1_bn1_moving_var: Tensor[(64), float32], %stage2_unit1_conv1_weight: Tensor[(128, 64, 3, 3), float32], %stage2_unit1_bn2_gamma: Tensor[(128), float32], %stage2_unit1_bn2_beta: Tensor[(128), float32], %stage2_unit1_bn2_moving_mean: Tensor[(128), float32], %stage2_unit1_bn2_moving_var: Tensor[(128), float32], %stage2_unit1_conv2_weight: Tensor[(128, 128, 3, 3), float32], %stage2_unit1_sc_weight: Tensor[(128, 64, 1, 1), float32], %stage2_unit2_bn1_gamma: Tensor[(128), float32], %stage2_unit2_bn1_beta: Tensor[(128), float32], %stage2_unit2_bn1_moving_mean: Tensor[(128), float32], %stage2_unit2_bn1_moving_var: Tensor[(128), float32], %stage2_unit2_conv1_weight: Tensor[(128, 128, 3, 3), float32], %stage2_unit2_bn2_gamma: Tensor[(128), float32], %stage2_unit2_bn2_beta: Tensor[(128), float32], %stage2_unit2_bn2_moving_mean: Tensor[(128), float32], %stage2_unit2_bn2_moving_var: Tensor[(128), float32], %stage2_unit2_conv2_weight: Tensor[(128, 128, 3, 3), float32], %stage3_unit1_bn1_gamma: Tensor[(128), float32], %stage3_unit1_bn1_beta: Tensor[(128), float32], %stage3_unit1_bn1_moving_mean: Tensor[(128), float32], %stage3_unit1_bn1_moving_var: Tensor[(128), float32], %stage3_unit1_conv1_weight: Tensor[(256, 128, 3, 3), float32], %stage3_unit1_bn2_gamma: Tensor[(256), float32], %stage3_unit1_bn2_beta: Tensor[(256), float32], %stage3_unit1_bn2_moving_mean: Tensor[(256), float32], %stage3_unit1_bn2_moving_var: Tensor[(256), float32], %stage3_unit1_conv2_weight: Tensor[(256, 256, 3, 3), float32], %stage3_unit1_sc_weight: Tensor[(256, 128, 1, 1), float32], %stage3_unit2_bn1_gamma: Tensor[(256), float32], %stage3_unit2_bn1_beta: Tensor[(256), float32], %stage3_unit2_bn1_moving_mean: Tensor[(256), float32], %stage3_unit2_bn1_moving_var: Tensor[(256), float32], %stage3_unit2_conv1_weight: Tensor[(256, 256, 3, 3), float32], %stage3_unit2_bn2_gamma: Tensor[(256), float32], %stage3_unit2_bn2_beta: Tensor[(256), float32], %stage3_unit2_bn2_moving_mean: Tensor[(256), float32], %stage3_unit2_bn2_moving_var: Tensor[(256), float32], %stage3_unit2_conv2_weight: Tensor[(256, 256, 3, 3), float32], %stage4_unit1_bn1_gamma: Tensor[(256), float32], %stage4_unit1_bn1_beta: Tensor[(256), float32], %stage4_unit1_bn1_moving_mean: Tensor[(256), float32], %stage4_unit1_bn1_moving_var: Tensor[(256), float32], %stage4_unit1_conv1_weight: Tensor[(512, 256, 3, 3), float32], %stage4_unit1_bn2_gamma: Tensor[(512), float32], %stage4_unit1_bn2_beta: Tensor[(512), float32], %stage4_unit1_bn2_moving_mean: Tensor[(512), float32], %stage4_unit1_bn2_moving_var: Tensor[(512), float32], %stage4_unit1_conv2_weight: Tensor[(512, 512, 3, 3), float32], %stage4_unit1_sc_weight: Tensor[(512, 256, 1, 1), float32], %stage4_unit2_bn1_gamma: Tensor[(512), float32], %stage4_unit2_bn1_beta: Tensor[(512), float32], %stage4_unit2_bn1_moving_mean: Tensor[(512), float32], %stage4_unit2_bn1_moving_var: Tensor[(512), float32], %stage4_unit2_conv1_weight: Tensor[(512, 512, 3, 3), float32], %stage4_unit2_bn2_gamma: Tensor[(512), float32], %stage4_unit2_bn2_beta: Tensor[(512), float32], %stage4_unit2_bn2_moving_mean: Tensor[(512), float32], %stage4_unit2_bn2_moving_var: Tensor[(512), float32], %stage4_unit2_conv2_weight: Tensor[(512, 512, 3, 3), float32], %bn1_gamma: Tensor[(512), float32], %bn1_beta: Tensor[(512), float32], %bn1_moving_mean: Tensor[(512), float32], %bn1_moving_var: Tensor[(512), float32], %fc1_weight: Tensor[(1000, 512), float32], %fc1_bias: Tensor[(1000), float32]) -> Tensor[(1, 1000), float32] {
  %0 = nn.batch_norm(%data, %bn_data_gamma, %bn_data_beta, %bn_data_moving_mean, %bn_data_moving_var, epsilon=2e-05f, scale=False) /* ty=(Tensor[(1, 3, 224, 224), float32], Tensor[(3), float32], Tensor[(3), float32]) */;
  %1 = %0.0;
  %2 = nn.conv2d(%1, %conv0_weight, strides=[2, 2], padding=[3, 3, 3, 3], channels=64, kernel_size=[7, 7]) /* ty=Tensor[(1, 64, 112, 112), float32] */;
  %3 = nn.batch_norm(%2, %bn0_gamma, %bn0_beta, %bn0_moving_mean, %bn0_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 112, 112), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %4 = %3.0;
  %5 = nn.relu(%4) /* ty=Tensor[(1, 64, 112, 112), float32] */;
  %6 = nn.max_pool2d(%5, pool_size=[3, 3], strides=[2, 2], padding=[1, 1]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %7 = nn.batch_norm(%6, %stage1_unit1_bn1_gamma, %stage1_unit1_bn1_beta, %stage1_unit1_bn1_moving_mean, %stage1_unit1_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %8 = %7.0;
  %9 = nn.relu(%8) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %10 = nn.conv2d(%9, %stage1_unit1_conv1_weight, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %11 = nn.batch_norm(%10, %stage1_unit1_bn2_gamma, %stage1_unit1_bn2_beta, %stage1_unit1_bn2_moving_mean, %stage1_unit1_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %12 = %11.0;
  %13 = nn.relu(%12) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %14 = nn.conv2d(%13, %stage1_unit1_conv2_weight, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %15 = nn.conv2d(%9, %stage1_unit1_sc_weight, padding=[0, 0, 0, 0], channels=64, kernel_size=[1, 1]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %16 = add(%14, %15) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %17 = nn.batch_norm(%16, %stage1_unit2_bn1_gamma, %stage1_unit2_bn1_beta, %stage1_unit2_bn1_moving_mean, %stage1_unit2_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %18 = %17.0;
  %19 = nn.relu(%18) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %20 = nn.conv2d(%19, %stage1_unit2_conv1_weight, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %21 = nn.batch_norm(%20, %stage1_unit2_bn2_gamma, %stage1_unit2_bn2_beta, %stage1_unit2_bn2_moving_mean, %stage1_unit2_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %22 = %21.0;
  %23 = nn.relu(%22) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %24 = nn.conv2d(%23, %stage1_unit2_conv2_weight, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %25 = add(%24, %16) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %26 = nn.batch_norm(%25, %stage2_unit1_bn1_gamma, %stage2_unit1_bn1_beta, %stage2_unit1_bn1_moving_mean, %stage2_unit1_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %27 = %26.0;
  %28 = nn.relu(%27) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %29 = nn.conv2d(%28, %stage2_unit1_conv1_weight, strides=[2, 2], padding=[1, 1, 1, 1], channels=128, kernel_size=[3, 3]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %30 = nn.batch_norm(%29, %stage2_unit1_bn2_gamma, %stage2_unit1_bn2_beta, %stage2_unit1_bn2_moving_mean, %stage2_unit1_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 128, 28, 28), float32], Tensor[(128), float32], Tensor[(128), float32]) */;
  %31 = %30.0;
  %32 = nn.relu(%31) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %33 = nn.conv2d(%32, %stage2_unit1_conv2_weight, padding=[1, 1, 1, 1], channels=128, kernel_size=[3, 3]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %34 = nn.conv2d(%28, %stage2_unit1_sc_weight, strides=[2, 2], padding=[0, 0, 0, 0], channels=128, kernel_size=[1, 1]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %35 = add(%33, %34) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %36 = nn.batch_norm(%35, %stage2_unit2_bn1_gamma, %stage2_unit2_bn1_beta, %stage2_unit2_bn1_moving_mean, %stage2_unit2_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 128, 28, 28), float32], Tensor[(128), float32], Tensor[(128), float32]) */;
  %37 = %36.0;
  %38 = nn.relu(%37) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %39 = nn.conv2d(%38, %stage2_unit2_conv1_weight, padding=[1, 1, 1, 1], channels=128, kernel_size=[3, 3]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %40 = nn.batch_norm(%39, %stage2_unit2_bn2_gamma, %stage2_unit2_bn2_beta, %stage2_unit2_bn2_moving_mean, %stage2_unit2_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 128, 28, 28), float32], Tensor[(128), float32], Tensor[(128), float32]) */;
  %41 = %40.0;
  %42 = nn.relu(%41) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %43 = nn.conv2d(%42, %stage2_unit2_conv2_weight, padding=[1, 1, 1, 1], channels=128, kernel_size=[3, 3]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %44 = add(%43, %35) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %45 = nn.batch_norm(%44, %stage3_unit1_bn1_gamma, %stage3_unit1_bn1_beta, %stage3_unit1_bn1_moving_mean, %stage3_unit1_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 128, 28, 28), float32], Tensor[(128), float32], Tensor[(128), float32]) */;
  %46 = %45.0;
  %47 = nn.relu(%46) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %48 = nn.conv2d(%47, %stage3_unit1_conv1_weight, strides=[2, 2], padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %49 = nn.batch_norm(%48, %stage3_unit1_bn2_gamma, %stage3_unit1_bn2_beta, %stage3_unit1_bn2_moving_mean, %stage3_unit1_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 256, 14, 14), float32], Tensor[(256), float32], Tensor[(256), float32]) */;
  %50 = %49.0;
  %51 = nn.relu(%50) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %52 = nn.conv2d(%51, %stage3_unit1_conv2_weight, padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %53 = nn.conv2d(%47, %stage3_unit1_sc_weight, strides=[2, 2], padding=[0, 0, 0, 0], channels=256, kernel_size=[1, 1]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %54 = add(%52, %53) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %55 = nn.batch_norm(%54, %stage3_unit2_bn1_gamma, %stage3_unit2_bn1_beta, %stage3_unit2_bn1_moving_mean, %stage3_unit2_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 256, 14, 14), float32], Tensor[(256), float32], Tensor[(256), float32]) */;
  %56 = %55.0;
  %57 = nn.relu(%56) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %58 = nn.conv2d(%57, %stage3_unit2_conv1_weight, padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %59 = nn.batch_norm(%58, %stage3_unit2_bn2_gamma, %stage3_unit2_bn2_beta, %stage3_unit2_bn2_moving_mean, %stage3_unit2_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 256, 14, 14), float32], Tensor[(256), float32], Tensor[(256), float32]) */;
  %60 = %59.0;
  %61 = nn.relu(%60) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %62 = nn.conv2d(%61, %stage3_unit2_conv2_weight, padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %63 = add(%62, %54) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %64 = nn.batch_norm(%63, %stage4_unit1_bn1_gamma, %stage4_unit1_bn1_beta, %stage4_unit1_bn1_moving_mean, %stage4_unit1_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 256, 14, 14), float32], Tensor[(256), float32], Tensor[(256), float32]) */;
  %65 = %64.0;
  %66 = nn.relu(%65) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %67 = nn.conv2d(%66, %stage4_unit1_conv1_weight, strides=[2, 2], padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %68 = nn.batch_norm(%67, %stage4_unit1_bn2_gamma, %stage4_unit1_bn2_beta, %stage4_unit1_bn2_moving_mean, %stage4_unit1_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 512, 7, 7), float32], Tensor[(512), float32], Tensor[(512), float32]) */;
  %69 = %68.0;
  %70 = nn.relu(%69) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %71 = nn.conv2d(%70, %stage4_unit1_conv2_weight, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %72 = nn.conv2d(%66, %stage4_unit1_sc_weight, strides=[2, 2], padding=[0, 0, 0, 0], channels=512, kernel_size=[1, 1]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %73 = add(%71, %72) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %74 = nn.batch_norm(%73, %stage4_unit2_bn1_gamma, %stage4_unit2_bn1_beta, %stage4_unit2_bn1_moving_mean, %stage4_unit2_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 512, 7, 7), float32], Tensor[(512), float32], Tensor[(512), float32]) */;
  %75 = %74.0;
  %76 = nn.relu(%75) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %77 = nn.conv2d(%76, %stage4_unit2_conv1_weight, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %78 = nn.batch_norm(%77, %stage4_unit2_bn2_gamma, %stage4_unit2_bn2_beta, %stage4_unit2_bn2_moving_mean, %stage4_unit2_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 512, 7, 7), float32], Tensor[(512), float32], Tensor[(512), float32]) */;
  %79 = %78.0;
  %80 = nn.relu(%79) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %81 = nn.conv2d(%80, %stage4_unit2_conv2_weight, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %82 = add(%81, %73) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %83 = nn.batch_norm(%82, %bn1_gamma, %bn1_beta, %bn1_moving_mean, %bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 512, 7, 7), float32], Tensor[(512), float32], Tensor[(512), float32]) */;
  %84 = %83.0;
  %85 = nn.relu(%84) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %86 = nn.global_avg_pool2d(%85) /* ty=Tensor[(1, 512, 1, 1), float32] */;
  %87 = nn.batch_flatten(%86) /* ty=Tensor[(1, 512), float32] */;
  %88 = nn.dense(%87, %fc1_weight, units=1000) /* ty=Tensor[(1, 1000), float32] */;
  %89 = nn.bias_add(%88, %fc1_bias, axis=-1) /* ty=Tensor[(1, 1000), float32] */;
  nn.softmax(%89) /* ty=Tensor[(1, 1000), float32] */
}

opt_level :3
target: opencl -device=intel_graphics -model=unknown
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 3, 224, 224, 'float32'), (64, 3, 7, 7, 'float32'), (2, 2), (3, 3, 3, 3), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 64, 56, 56, 'float32'), (64, 64, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 64, 56, 56, 'float32'), (64, 64, 1, 1, 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 64, 56, 56, 'float32'), (128, 64, 3, 3, 'float32'), (2, 2), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 128, 28, 28, 'float32'), (128, 128, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 64, 56, 56, 'float32'), (128, 64, 1, 1, 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 128, 28, 28, 'float32'), (256, 128, 3, 3, 'float32'), (2, 2), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 256, 14, 14, 'float32'), (256, 256, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 128, 28, 28, 'float32'), (256, 128, 1, 1, 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 256, 14, 14, 'float32'), (512, 256, 3, 3, 'float32'), (2, 2), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 512, 7, 7, 'float32'), (512, 512, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 256, 14, 14, 'float32'), (512, 256, 1, 1, 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('dense', (1, 512, 'float32'), (1000, 512, 'float32'), 0, 'float32'). A fallback configuration is used, which may bring great performance regression.
[15:41:10] /home/user/tvm/tvm/src/runtime/opencl/opencl_device_api.cc:263: No OpenCL platform matched given existing options ...
Traceback (most recent call last):

  File "/mnt/d/Codes/tvm_testbed/tvm_test1.py", line 123, in <module>
    module = graph_runtime.create(graph, lib, ctx)

  File "/home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/contrib/graph_runtime.py", line 59, in create
    return GraphModule(fcreate(graph_json_str, libmod, *device_type_id))

  File "tvm/_ffi/_cython/./packed_func.pxi", line 308, in tvm._ffi._cy3.core.PackedFuncBase.__call__

  File "tvm/_ffi/_cython/./packed_func.pxi", line 253, in tvm._ffi._cy3.core.FuncCall

  File "tvm/_ffi/_cython/./base.pxi", line 159, in tvm._ffi._cy3.core.CALL

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::DeviceAPIManager::GetAPI(int, bool)+0x17f) [0x7f7d8f0ac1ff]
  [bt] (7) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::DeviceAPIManager::GetAPI(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)+0xed) [0x7f7d8f0abcbd]
  [bt] (6) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(+0x13ff13c) [0x7f7d8f15313c]
  [bt] (5) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::gl::OpenGLWorkspace::Global()+0x55) [0x7f7d8f153065]
  [bt] (4) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::gl::OpenGLWorkspace::OpenGLWorkspace()+0x4d) [0x7f7d8f15030d]
  [bt] (3) /usr/lib/x86_64-linux-gnu/libglfw.so.3(glfwInit+0x2d) [0x7f7d8d6f5b3d]
  [bt] (2) /usr/lib/x86_64-linux-gnu/libglfw.so.3(+0xbdc5) [0x7f7d8d6fbdc5]
  [bt] (1) /usr/lib/x86_64-linux-gnu/libglfw.so.3(+0x59f7) [0x7f7d8d6f59f7]
  [bt] (0) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::gl::GlfwErrorCallback(int, char const*)+0x2f9) [0x7f7d8f14b169]
  File "/home/user/tvm/tvm/src/runtime/opengl/opengl_device_api.cc", line 81
Error: [65544] X11: The DISPLAY environment variable is missing

apart from all the can not find the config for opencl which I dont know why its giving me that, I can see [15:41:10] /home/user/tvm/tvm/src/runtime/opencl/opencl_device_api.cc:263: No OpenCL platform matched given existing options ...
and File β€œ/home/user/tvm/tvm/src/runtime/opengl/opengl_device_api.cc”, line 81 Error: [65544] X11: The DISPLAY environment variable is missing

both of which are installed ( I made sure I installed all dependencies! yet this complains about them regardless)

Any help is greatly appreciated

Did you use -X when ssh, and setup DISPLAY accordingly? To me this seems like a general X11 forwarding problem.

2 Likes

@comaniac : Thanks a lot for your kind reply. I dont have a display on that system! and why would tvm, need a display to work in first place?
I’m developing inside wsl, from vscode! and dont ssh into it!(it uses ssh to wsl internally) if having a display a must have, I also did install x11, and an x11 forwarder, VcXsrv, which works fine. I can execute different gui apps such as gedit, xeyes, … and they all show up just fine. however, if I try to run the code inside the wsl (without vscode and using simply python my_tvm_test.py after that I get these errors,


v0.0.4
def @main(%data: Tensor[(1, 3, 224, 224), float32], %bn_data_gamma: Tensor[(3), float32], %bn_data_beta: Tensor[(3), float32], %bn_data_moving_mean: Tensor[(3), float32], %bn_data_moving_var: Tensor[(3), float32], %conv0_weight: Tensor[(64, 3, 7, 7), float32], %bn0_gamma: Tensor[(64), float32], %bn0_beta: Tensor[(64), float32], %bn0_moving_mean: Tensor[(64), float32], %bn0_moving_var: Tensor[(64), float32], %stage1_unit1_bn1_gamma: Tensor[(64), float32], %stage1_unit1_bn1_beta: Tensor[(64), float32], %stage1_unit1_bn1_moving_mean: Tensor[(64), float32], %stage1_unit1_bn1_moving_var: Tensor[(64), float32], %stage1_unit1_conv1_weight: Tensor[(64, 64, 3, 3), float32], %stage1_unit1_bn2_gamma: Tensor[(64), float32], %stage1_unit1_bn2_beta: Tensor[(64), float32], %stage1_unit1_bn2_moving_mean: Tensor[(64), float32], %stage1_unit1_bn2_moving_var: Tensor[(64), float32], %stage1_unit1_conv2_weight: Tensor[(64, 64, 3, 3), float32], %stage1_unit1_sc_weight: Tensor[(64, 64, 1, 1), float32], %stage1_unit2_bn1_gamma: Tensor[(64), float32], %stage1_unit2_bn1_beta: Tensor[(64), float32], %stage1_unit2_bn1_moving_mean: Tensor[(64), float32], %stage1_unit2_bn1_moving_var: Tensor[(64), float32], %stage1_unit2_conv1_weight: Tensor[(64, 64, 3, 3), float32], %stage1_unit2_bn2_gamma: Tensor[(64), float32], %stage1_unit2_bn2_beta: Tensor[(64), float32], %stage1_unit2_bn2_moving_mean: Tensor[(64), float32], %stage1_unit2_bn2_moving_var: Tensor[(64), float32], %stage1_unit2_conv2_weight: Tensor[(64, 64, 3, 3), float32], %stage2_unit1_bn1_gamma: Tensor[(64), float32], %stage2_unit1_bn1_beta: Tensor[(64), float32], %stage2_unit1_bn1_moving_mean: Tensor[(64), float32], %stage2_unit1_bn1_moving_var: Tensor[(64), float32], %stage2_unit1_conv1_weight: Tensor[(128, 64, 3, 3), float32], %stage2_unit1_bn2_gamma: Tensor[(128), float32], %stage2_unit1_bn2_beta: Tensor[(128), float32], %stage2_unit1_bn2_moving_mean: Tensor[(128), float32], %stage2_unit1_bn2_moving_var: Tensor[(128), float32], %stage2_unit1_conv2_weight: Tensor[(128, 128, 3, 3), float32], %stage2_unit1_sc_weight: Tensor[(128, 64, 1, 1), float32], %stage2_unit2_bn1_gamma: Tensor[(128), float32], %stage2_unit2_bn1_beta: Tensor[(128), float32], %stage2_unit2_bn1_moving_mean: Tensor[(128), float32], %stage2_unit2_bn1_moving_var: Tensor[(128), float32], %stage2_unit2_conv1_weight: Tensor[(128, 128, 3, 3), float32], %stage2_unit2_bn2_gamma: Tensor[(128), float32], %stage2_unit2_bn2_beta: Tensor[(128), float32], %stage2_unit2_bn2_moving_mean: Tensor[(128), float32], %stage2_unit2_bn2_moving_var: Tensor[(128), float32], %stage2_unit2_conv2_weight: Tensor[(128, 128, 3, 3), float32], %stage3_unit1_bn1_gamma: Tensor[(128), float32], %stage3_unit1_bn1_beta: Tensor[(128), float32], %stage3_unit1_bn1_moving_mean: Tensor[(128), float32], %stage3_unit1_bn1_moving_var: Tensor[(128), float32], %stage3_unit1_conv1_weight: Tensor[(256, 128, 3, 3), float32], %stage3_unit1_bn2_gamma: Tensor[(256), float32], %stage3_unit1_bn2_beta: Tensor[(256), float32], %stage3_unit1_bn2_moving_mean: Tensor[(256), float32], %stage3_unit1_bn2_moving_var: Tensor[(256), float32], %stage3_unit1_conv2_weight: Tensor[(256, 256, 3, 3), float32], %stage3_unit1_sc_weight: Tensor[(256, 128, 1, 1), float32], %stage3_unit2_bn1_gamma: Tensor[(256), float32], %stage3_unit2_bn1_beta: Tensor[(256), float32], %stage3_unit2_bn1_moving_mean: Tensor[(256), float32], %stage3_unit2_bn1_moving_var: Tensor[(256), float32], %stage3_unit2_conv1_weight: Tensor[(256, 256, 3, 3), float32], %stage3_unit2_bn2_gamma: Tensor[(256), float32], %stage3_unit2_bn2_beta: Tensor[(256), float32], %stage3_unit2_bn2_moving_mean: Tensor[(256), float32], %stage3_unit2_bn2_moving_var: Tensor[(256), float32], %stage3_unit2_conv2_weight: Tensor[(256, 256, 3, 3), float32], %stage4_unit1_bn1_gamma: Tensor[(256), float32], %stage4_unit1_bn1_beta: Tensor[(256), float32], %stage4_unit1_bn1_moving_mean: Tensor[(256), float32], %stage4_unit1_bn1_moving_var: Tensor[(256), float32], %stage4_unit1_conv1_weight: Tensor[(512, 256, 3, 3), float32], %stage4_unit1_bn2_gamma: Tensor[(512), float32], %stage4_unit1_bn2_beta: Tensor[(512), float32], %stage4_unit1_bn2_moving_mean: Tensor[(512), float32], %stage4_unit1_bn2_moving_var: Tensor[(512), float32], %stage4_unit1_conv2_weight: Tensor[(512, 512, 3, 3), float32], %stage4_unit1_sc_weight: Tensor[(512, 256, 1, 1), float32], %stage4_unit2_bn1_gamma: Tensor[(512), float32], %stage4_unit2_bn1_beta: Tensor[(512), float32], %stage4_unit2_bn1_moving_mean: Tensor[(512), float32], %stage4_unit2_bn1_moving_var: Tensor[(512), float32], %stage4_unit2_conv1_weight: Tensor[(512, 512, 3, 3), float32], %stage4_unit2_bn2_gamma: Tensor[(512), float32], %stage4_unit2_bn2_beta: Tensor[(512), float32], %stage4_unit2_bn2_moving_mean: Tensor[(512), float32], %stage4_unit2_bn2_moving_var: Tensor[(512), float32], %stage4_unit2_conv2_weight: Tensor[(512, 512, 3, 3), float32], %bn1_gamma: Tensor[(512), float32], %bn1_beta: Tensor[(512), float32], %bn1_moving_mean: Tensor[(512), float32], %bn1_moving_var: Tensor[(512), float32], %fc1_weight: Tensor[(1000, 512), float32], %fc1_bias: Tensor[(1000), float32]) -> Tensor[(1, 1000), float32] {
  %0 = nn.batch_norm(%data, %bn_data_gamma, %bn_data_beta, %bn_data_moving_mean, %bn_data_moving_var, epsilon=2e-05f, scale=False) /* ty=(Tensor[(1, 3, 224, 224), float32], Tensor[(3), float32], Tensor[(3), float32]) */;
  %1 = %0.0;
  %2 = nn.conv2d(%1, %conv0_weight, strides=[2, 2], padding=[3, 3, 3, 3], channels=64, kernel_size=[7, 7]) /* ty=Tensor[(1, 64, 112, 112), float32] */;
  %3 = nn.batch_norm(%2, %bn0_gamma, %bn0_beta, %bn0_moving_mean, %bn0_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 112, 112), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %4 = %3.0;
  %5 = nn.relu(%4) /* ty=Tensor[(1, 64, 112, 112), float32] */;
  %6 = nn.max_pool2d(%5, pool_size=[3, 3], strides=[2, 2], padding=[1, 1]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %7 = nn.batch_norm(%6, %stage1_unit1_bn1_gamma, %stage1_unit1_bn1_beta, %stage1_unit1_bn1_moving_mean, %stage1_unit1_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %8 = %7.0;
  %9 = nn.relu(%8) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %10 = nn.conv2d(%9, %stage1_unit1_conv1_weight, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %11 = nn.batch_norm(%10, %stage1_unit1_bn2_gamma, %stage1_unit1_bn2_beta, %stage1_unit1_bn2_moving_mean, %stage1_unit1_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %12 = %11.0;
  %13 = nn.relu(%12) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %14 = nn.conv2d(%13, %stage1_unit1_conv2_weight, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %15 = nn.conv2d(%9, %stage1_unit1_sc_weight, padding=[0, 0, 0, 0], channels=64, kernel_size=[1, 1]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %16 = add(%14, %15) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %17 = nn.batch_norm(%16, %stage1_unit2_bn1_gamma, %stage1_unit2_bn1_beta, %stage1_unit2_bn1_moving_mean, %stage1_unit2_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %18 = %17.0;
  %19 = nn.relu(%18) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %20 = nn.conv2d(%19, %stage1_unit2_conv1_weight, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %21 = nn.batch_norm(%20, %stage1_unit2_bn2_gamma, %stage1_unit2_bn2_beta, %stage1_unit2_bn2_moving_mean, %stage1_unit2_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %22 = %21.0;
  %23 = nn.relu(%22) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %24 = nn.conv2d(%23, %stage1_unit2_conv2_weight, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %25 = add(%24, %16) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %26 = nn.batch_norm(%25, %stage2_unit1_bn1_gamma, %stage2_unit1_bn1_beta, %stage2_unit1_bn1_moving_mean, %stage2_unit1_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %27 = %26.0;
  %28 = nn.relu(%27) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %29 = nn.conv2d(%28, %stage2_unit1_conv1_weight, strides=[2, 2], padding=[1, 1, 1, 1], channels=128, kernel_size=[3, 3]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %30 = nn.batch_norm(%29, %stage2_unit1_bn2_gamma, %stage2_unit1_bn2_beta, %stage2_unit1_bn2_moving_mean, %stage2_unit1_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 128, 28, 28), float32], Tensor[(128), float32], Tensor[(128), float32]) */;
  %31 = %30.0;
  %32 = nn.relu(%31) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %33 = nn.conv2d(%32, %stage2_unit1_conv2_weight, padding=[1, 1, 1, 1], channels=128, kernel_size=[3, 3]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %34 = nn.conv2d(%28, %stage2_unit1_sc_weight, strides=[2, 2], padding=[0, 0, 0, 0], channels=128, kernel_size=[1, 1]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %35 = add(%33, %34) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %36 = nn.batch_norm(%35, %stage2_unit2_bn1_gamma, %stage2_unit2_bn1_beta, %stage2_unit2_bn1_moving_mean, %stage2_unit2_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 128, 28, 28), float32], Tensor[(128), float32], Tensor[(128), float32]) */;
  %37 = %36.0;
  %38 = nn.relu(%37) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %39 = nn.conv2d(%38, %stage2_unit2_conv1_weight, padding=[1, 1, 1, 1], channels=128, kernel_size=[3, 3]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %40 = nn.batch_norm(%39, %stage2_unit2_bn2_gamma, %stage2_unit2_bn2_beta, %stage2_unit2_bn2_moving_mean, %stage2_unit2_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 128, 28, 28), float32], Tensor[(128), float32], Tensor[(128), float32]) */;
  %41 = %40.0;
  %42 = nn.relu(%41) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %43 = nn.conv2d(%42, %stage2_unit2_conv2_weight, padding=[1, 1, 1, 1], channels=128, kernel_size=[3, 3]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %44 = add(%43, %35) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %45 = nn.batch_norm(%44, %stage3_unit1_bn1_gamma, %stage3_unit1_bn1_beta, %stage3_unit1_bn1_moving_mean, %stage3_unit1_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 128, 28, 28), float32], Tensor[(128), float32], Tensor[(128), float32]) */;
  %46 = %45.0;
  %47 = nn.relu(%46) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %48 = nn.conv2d(%47, %stage3_unit1_conv1_weight, strides=[2, 2], padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %49 = nn.batch_norm(%48, %stage3_unit1_bn2_gamma, %stage3_unit1_bn2_beta, %stage3_unit1_bn2_moving_mean, %stage3_unit1_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 256, 14, 14), float32], Tensor[(256), float32], Tensor[(256), float32]) */;
  %50 = %49.0;
  %51 = nn.relu(%50) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %52 = nn.conv2d(%51, %stage3_unit1_conv2_weight, padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %53 = nn.conv2d(%47, %stage3_unit1_sc_weight, strides=[2, 2], padding=[0, 0, 0, 0], channels=256, kernel_size=[1, 1]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %54 = add(%52, %53) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %55 = nn.batch_norm(%54, %stage3_unit2_bn1_gamma, %stage3_unit2_bn1_beta, %stage3_unit2_bn1_moving_mean, %stage3_unit2_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 256, 14, 14), float32], Tensor[(256), float32], Tensor[(256), float32]) */;
  %56 = %55.0;
  %57 = nn.relu(%56) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %58 = nn.conv2d(%57, %stage3_unit2_conv1_weight, padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %59 = nn.batch_norm(%58, %stage3_unit2_bn2_gamma, %stage3_unit2_bn2_beta, %stage3_unit2_bn2_moving_mean, %stage3_unit2_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 256, 14, 14), float32], Tensor[(256), float32], Tensor[(256), float32]) */;
  %60 = %59.0;
  %61 = nn.relu(%60) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %62 = nn.conv2d(%61, %stage3_unit2_conv2_weight, padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %63 = add(%62, %54) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %64 = nn.batch_norm(%63, %stage4_unit1_bn1_gamma, %stage4_unit1_bn1_beta, %stage4_unit1_bn1_moving_mean, %stage4_unit1_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 256, 14, 14), float32], Tensor[(256), float32], Tensor[(256), float32]) */;
  %65 = %64.0;
  %66 = nn.relu(%65) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %67 = nn.conv2d(%66, %stage4_unit1_conv1_weight, strides=[2, 2], padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %68 = nn.batch_norm(%67, %stage4_unit1_bn2_gamma, %stage4_unit1_bn2_beta, %stage4_unit1_bn2_moving_mean, %stage4_unit1_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 512, 7, 7), float32], Tensor[(512), float32], Tensor[(512), float32]) */;
  %69 = %68.0;
  %70 = nn.relu(%69) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %71 = nn.conv2d(%70, %stage4_unit1_conv2_weight, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %72 = nn.conv2d(%66, %stage4_unit1_sc_weight, strides=[2, 2], padding=[0, 0, 0, 0], channels=512, kernel_size=[1, 1]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %73 = add(%71, %72) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %74 = nn.batch_norm(%73, %stage4_unit2_bn1_gamma, %stage4_unit2_bn1_beta, %stage4_unit2_bn1_moving_mean, %stage4_unit2_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 512, 7, 7), float32], Tensor[(512), float32], Tensor[(512), float32]) */;
  %75 = %74.0;
  %76 = nn.relu(%75) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %77 = nn.conv2d(%76, %stage4_unit2_conv1_weight, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %78 = nn.batch_norm(%77, %stage4_unit2_bn2_gamma, %stage4_unit2_bn2_beta, %stage4_unit2_bn2_moving_mean, %stage4_unit2_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 512, 7, 7), float32], Tensor[(512), float32], Tensor[(512), float32]) */;
  %79 = %78.0;
  %80 = nn.relu(%79) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %81 = nn.conv2d(%80, %stage4_unit2_conv2_weight, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %82 = add(%81, %73) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %83 = nn.batch_norm(%82, %bn1_gamma, %bn1_beta, %bn1_moving_mean, %bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 512, 7, 7), float32], Tensor[(512), float32], Tensor[(512), float32]) */;
  %84 = %83.0;
  %85 = nn.relu(%84) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %86 = nn.global_avg_pool2d(%85) /* ty=Tensor[(1, 512, 1, 1), float32] */;
  %87 = nn.batch_flatten(%86) /* ty=Tensor[(1, 512), float32] */;
  %88 = nn.dense(%87, %fc1_weight, units=1000) /* ty=Tensor[(1, 1000), float32] */;
  %89 = nn.bias_add(%88, %fc1_bias, axis=-1) /* ty=Tensor[(1, 1000), float32] */;
  nn.softmax(%89) /* ty=Tensor[(1, 1000), float32] */
}

opt_level :3
target: opencl -device=intel_graphics -model=unknown
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 3, 224, 224, 'float32'), (64, 3, 7, 7, 'float32'), (2, 2), (3, 3, 3, 3), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 64, 56, 56, 'float32'), (64, 64, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 64, 56, 56, 'float32'), (64, 64, 1, 1, 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 64, 56, 56, 'float32'), (128, 64, 3, 3, 'float32'), (2, 2), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 128, 28, 28, 'float32'), (128, 128, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 64, 56, 56, 'float32'), (128, 64, 1, 1, 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 128, 28, 28, 'float32'), (256, 128, 3, 3, 'float32'), (2, 2), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 256, 14, 14, 'float32'), (256, 256, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 128, 28, 28, 'float32'), (256, 128, 1, 1, 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 256, 14, 14, 'float32'), (512, 256, 3, 3, 'float32'), (2, 2), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 512, 7, 7, 'float32'), (512, 512, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 256, 14, 14, 'float32'), (512, 256, 1, 1, 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('dense', (1, 512, 'float32'), (1000, 512, 'float32'), 0, 'float32'). A fallback configuration is used, which may bring great performance regression.
[15:38:33] /home/user/tvm/tvm/src/runtime/opencl/opencl_device_api.cc:263: No OpenCL platform matched given existing options ...
Traceback (most recent call last):

  File "tvm_test1.py", line 123, in <module>
    module = graph_runtime.create(graph, lib, ctx)

  File "/home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/contrib/graph_runtime.py", line 59, in create
    return GraphModule(fcreate(graph_json_str, libmod, *device_type_id))

  File "tvm/_ffi/_cython/./packed_func.pxi", line 308, in tvm._ffi._cy3.core.PackedFuncBase.__call__

  File "tvm/_ffi/_cython/./packed_func.pxi", line 253, in tvm._ffi._cy3.core.FuncCall

  File "tvm/_ffi/_cython/./base.pxi", line 159, in tvm._ffi._cy3.core.CALL

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::DeviceAPIManager::GetAPI(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)+0xed) [0x7ff6910fbcbd]
  [bt] (7) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(+0x13ff13c) [0x7ff6911a313c]
  [bt] (6) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::gl::OpenGLWorkspace::Global()+0x55) [0x7ff6911a3065]
  [bt] (5) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::gl::OpenGLWorkspace::OpenGLWorkspace()+0x4d) [0x7ff6911a030d]
  [bt] (4) /usr/lib/x86_64-linux-gnu/libglfw.so.3(glfwInit+0x2d) [0x7ff68f745b3d]
  [bt] (3) /usr/lib/x86_64-linux-gnu/libglfw.so.3(+0xb9d5) [0x7ff68f74b9d5]
  [bt] (2) /usr/lib/x86_64-linux-gnu/libglfw.so.3(+0x11499) [0x7ff68f751499]
  [bt] (1) /usr/lib/x86_64-linux-gnu/libglfw.so.3(+0x59f7) [0x7ff68f7459f7]
  [bt] (0) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::gl::GlfwErrorCallback(int, char const*)+0x2f9) [0x7ff69119b169]
  File "/home/user/tvm/tvm/src/runtime/opengl/opengl_device_api.cc", line 81
Error: [65544] Linux: Failed to watch for joystick connections in /dev/input: No such file or directory

What is wrong here? Why is it switching to opengl instead of opencl? (I have installed all opengl requirements by the way, so even that shouldnt have given any error imho!)

I never used TVM OpenCL flow nor OpenGL runtime, so I have no idea why and how it requires X11 forwarding. Maybe @Laurawly could help clarification.

Apart from that, if I remember correctly, you still need to set up everything for X11 forwarding even you are using WSL.

For the display error, can you try to do plt.savefig("ssd.png") instead of plt.show() in the tutorial to see if the error will be gone? The tutorial for ssd is under tutorials/frontend/deploy_ssd_gluoncv.py As for the opengl switch I’m not quite sure about the reason either.

I’m not doint any plt.show() I’m running this sphx-glr-tutorials-relay-quick-start-py example and only changing cuda to opencl. its weird but the error message I’m getting has changed to TVMError: Check failed: context != nullptr: No OpenCL device!(while there is, its my intel integrated GPU!)

v0.0.4
def @main(%data: Tensor[(1, 3, 224, 224), float32], %bn_data_gamma: Tensor[(3), float32], %bn_data_beta: Tensor[(3), float32], %bn_data_moving_mean: Tensor[(3), float32], %bn_data_moving_var: Tensor[(3), float32], %conv0_weight: Tensor[(64, 3, 7, 7), float32], %bn0_gamma: Tensor[(64), float32], %bn0_beta: Tensor[(64), float32], %bn0_moving_mean: Tensor[(64), float32], %bn0_moving_var: Tensor[(64), float32], %stage1_unit1_bn1_gamma: Tensor[(64), float32], %stage1_unit1_bn1_beta: Tensor[(64), float32], %stage1_unit1_bn1_moving_mean: Tensor[(64), float32], %stage1_unit1_bn1_moving_var: Tensor[(64), float32], %stage1_unit1_conv1_weight: Tensor[(64, 64, 3, 3), float32], %stage1_unit1_bn2_gamma: Tensor[(64), float32], %stage1_unit1_bn2_beta: Tensor[(64), float32], %stage1_unit1_bn2_moving_mean: Tensor[(64), float32], %stage1_unit1_bn2_moving_var: Tensor[(64), float32], %stage1_unit1_conv2_weight: Tensor[(64, 64, 3, 3), float32], %stage1_unit1_sc_weight: Tensor[(64, 64, 1, 1), float32], %stage1_unit2_bn1_gamma: Tensor[(64), float32], %stage1_unit2_bn1_beta: Tensor[(64), float32], %stage1_unit2_bn1_moving_mean: Tensor[(64), float32], %stage1_unit2_bn1_moving_var: Tensor[(64), float32], %stage1_unit2_conv1_weight: Tensor[(64, 64, 3, 3), float32], %stage1_unit2_bn2_gamma: Tensor[(64), float32], %stage1_unit2_bn2_beta: Tensor[(64), float32], %stage1_unit2_bn2_moving_mean: Tensor[(64), float32], %stage1_unit2_bn2_moving_var: Tensor[(64), float32], %stage1_unit2_conv2_weight: Tensor[(64, 64, 3, 3), float32], %stage2_unit1_bn1_gamma: Tensor[(64), float32], %stage2_unit1_bn1_beta: Tensor[(64), float32], %stage2_unit1_bn1_moving_mean: Tensor[(64), float32], %stage2_unit1_bn1_moving_var: Tensor[(64), float32], %stage2_unit1_conv1_weight: Tensor[(128, 64, 3, 3), float32], %stage2_unit1_bn2_gamma: Tensor[(128), float32], %stage2_unit1_bn2_beta: Tensor[(128), float32], %stage2_unit1_bn2_moving_mean: Tensor[(128), float32], %stage2_unit1_bn2_moving_var: Tensor[(128), float32], %stage2_unit1_conv2_weight: Tensor[(128, 128, 3, 3), float32], %stage2_unit1_sc_weight: Tensor[(128, 64, 1, 1), float32], %stage2_unit2_bn1_gamma: Tensor[(128), float32], %stage2_unit2_bn1_beta: Tensor[(128), float32], %stage2_unit2_bn1_moving_mean: Tensor[(128), float32], %stage2_unit2_bn1_moving_var: Tensor[(128), float32], %stage2_unit2_conv1_weight: Tensor[(128, 128, 3, 3), float32], %stage2_unit2_bn2_gamma: Tensor[(128), float32], %stage2_unit2_bn2_beta: Tensor[(128), float32], %stage2_unit2_bn2_moving_mean: Tensor[(128), float32], %stage2_unit2_bn2_moving_var: Tensor[(128), float32], %stage2_unit2_conv2_weight: Tensor[(128, 128, 3, 3), float32], %stage3_unit1_bn1_gamma: Tensor[(128), float32], %stage3_unit1_bn1_beta: Tensor[(128), float32], %stage3_unit1_bn1_moving_mean: Tensor[(128), float32], %stage3_unit1_bn1_moving_var: Tensor[(128), float32], %stage3_unit1_conv1_weight: Tensor[(256, 128, 3, 3), float32], %stage3_unit1_bn2_gamma: Tensor[(256), float32], %stage3_unit1_bn2_beta: Tensor[(256), float32], %stage3_unit1_bn2_moving_mean: Tensor[(256), float32], %stage3_unit1_bn2_moving_var: Tensor[(256), float32], %stage3_unit1_conv2_weight: Tensor[(256, 256, 3, 3), float32], %stage3_unit1_sc_weight: Tensor[(256, 128, 1, 1), float32], %stage3_unit2_bn1_gamma: Tensor[(256), float32], %stage3_unit2_bn1_beta: Tensor[(256), float32], %stage3_unit2_bn1_moving_mean: Tensor[(256), float32], %stage3_unit2_bn1_moving_var: Tensor[(256), float32], %stage3_unit2_conv1_weight: Tensor[(256, 256, 3, 3), float32], %stage3_unit2_bn2_gamma: Tensor[(256), float32], %stage3_unit2_bn2_beta: Tensor[(256), float32], %stage3_unit2_bn2_moving_mean: Tensor[(256), float32], %stage3_unit2_bn2_moving_var: Tensor[(256), float32], %stage3_unit2_conv2_weight: Tensor[(256, 256, 3, 3), float32], %stage4_unit1_bn1_gamma: Tensor[(256), float32], %stage4_unit1_bn1_beta: Tensor[(256), float32], %stage4_unit1_bn1_moving_mean: Tensor[(256), float32], %stage4_unit1_bn1_moving_var: Tensor[(256), float32], %stage4_unit1_conv1_weight: Tensor[(512, 256, 3, 3), float32], %stage4_unit1_bn2_gamma: Tensor[(512), float32], %stage4_unit1_bn2_beta: Tensor[(512), float32], %stage4_unit1_bn2_moving_mean: Tensor[(512), float32], %stage4_unit1_bn2_moving_var: Tensor[(512), float32], %stage4_unit1_conv2_weight: Tensor[(512, 512, 3, 3), float32], %stage4_unit1_sc_weight: Tensor[(512, 256, 1, 1), float32], %stage4_unit2_bn1_gamma: Tensor[(512), float32], %stage4_unit2_bn1_beta: Tensor[(512), float32], %stage4_unit2_bn1_moving_mean: Tensor[(512), float32], %stage4_unit2_bn1_moving_var: Tensor[(512), float32], %stage4_unit2_conv1_weight: Tensor[(512, 512, 3, 3), float32], %stage4_unit2_bn2_gamma: Tensor[(512), float32], %stage4_unit2_bn2_beta: Tensor[(512), float32], %stage4_unit2_bn2_moving_mean: Tensor[(512), float32], %stage4_unit2_bn2_moving_var: Tensor[(512), float32], %stage4_unit2_conv2_weight: Tensor[(512, 512, 3, 3), float32], %bn1_gamma: Tensor[(512), float32], %bn1_beta: Tensor[(512), float32], %bn1_moving_mean: Tensor[(512), float32], %bn1_moving_var: Tensor[(512), float32], %fc1_weight: Tensor[(1000, 512), float32], %fc1_bias: Tensor[(1000), float32]) -> Tensor[(1, 1000), float32] {
  %0 = nn.batch_norm(%data, %bn_data_gamma, %bn_data_beta, %bn_data_moving_mean, %bn_data_moving_var, epsilon=2e-05f, scale=False) /* ty=(Tensor[(1, 3, 224, 224), float32], Tensor[(3), float32], Tensor[(3), float32]) */;
  %1 = %0.0;
  %2 = nn.conv2d(%1, %conv0_weight, strides=[2, 2], padding=[3, 3, 3, 3], channels=64, kernel_size=[7, 7]) /* ty=Tensor[(1, 64, 112, 112), float32] */;
  %3 = nn.batch_norm(%2, %bn0_gamma, %bn0_beta, %bn0_moving_mean, %bn0_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 112, 112), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %4 = %3.0;
  %5 = nn.relu(%4) /* ty=Tensor[(1, 64, 112, 112), float32] */;
  %6 = nn.max_pool2d(%5, pool_size=[3, 3], strides=[2, 2], padding=[1, 1]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %7 = nn.batch_norm(%6, %stage1_unit1_bn1_gamma, %stage1_unit1_bn1_beta, %stage1_unit1_bn1_moving_mean, %stage1_unit1_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %8 = %7.0;
  %9 = nn.relu(%8) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %10 = nn.conv2d(%9, %stage1_unit1_conv1_weight, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %11 = nn.batch_norm(%10, %stage1_unit1_bn2_gamma, %stage1_unit1_bn2_beta, %stage1_unit1_bn2_moving_mean, %stage1_unit1_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %12 = %11.0;
  %13 = nn.relu(%12) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %14 = nn.conv2d(%13, %stage1_unit1_conv2_weight, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %15 = nn.conv2d(%9, %stage1_unit1_sc_weight, padding=[0, 0, 0, 0], channels=64, kernel_size=[1, 1]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %16 = add(%14, %15) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %17 = nn.batch_norm(%16, %stage1_unit2_bn1_gamma, %stage1_unit2_bn1_beta, %stage1_unit2_bn1_moving_mean, %stage1_unit2_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %18 = %17.0;
  %19 = nn.relu(%18) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %20 = nn.conv2d(%19, %stage1_unit2_conv1_weight, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %21 = nn.batch_norm(%20, %stage1_unit2_bn2_gamma, %stage1_unit2_bn2_beta, %stage1_unit2_bn2_moving_mean, %stage1_unit2_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %22 = %21.0;
  %23 = nn.relu(%22) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %24 = nn.conv2d(%23, %stage1_unit2_conv2_weight, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %25 = add(%24, %16) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %26 = nn.batch_norm(%25, %stage2_unit1_bn1_gamma, %stage2_unit1_bn1_beta, %stage2_unit1_bn1_moving_mean, %stage2_unit1_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 64, 56, 56), float32], Tensor[(64), float32], Tensor[(64), float32]) */;
  %27 = %26.0;
  %28 = nn.relu(%27) /* ty=Tensor[(1, 64, 56, 56), float32] */;
  %29 = nn.conv2d(%28, %stage2_unit1_conv1_weight, strides=[2, 2], padding=[1, 1, 1, 1], channels=128, kernel_size=[3, 3]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %30 = nn.batch_norm(%29, %stage2_unit1_bn2_gamma, %stage2_unit1_bn2_beta, %stage2_unit1_bn2_moving_mean, %stage2_unit1_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 128, 28, 28), float32], Tensor[(128), float32], Tensor[(128), float32]) */;
  %31 = %30.0;
  %32 = nn.relu(%31) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %33 = nn.conv2d(%32, %stage2_unit1_conv2_weight, padding=[1, 1, 1, 1], channels=128, kernel_size=[3, 3]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %34 = nn.conv2d(%28, %stage2_unit1_sc_weight, strides=[2, 2], padding=[0, 0, 0, 0], channels=128, kernel_size=[1, 1]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %35 = add(%33, %34) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %36 = nn.batch_norm(%35, %stage2_unit2_bn1_gamma, %stage2_unit2_bn1_beta, %stage2_unit2_bn1_moving_mean, %stage2_unit2_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 128, 28, 28), float32], Tensor[(128), float32], Tensor[(128), float32]) */;
  %37 = %36.0;
  %38 = nn.relu(%37) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %39 = nn.conv2d(%38, %stage2_unit2_conv1_weight, padding=[1, 1, 1, 1], channels=128, kernel_size=[3, 3]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %40 = nn.batch_norm(%39, %stage2_unit2_bn2_gamma, %stage2_unit2_bn2_beta, %stage2_unit2_bn2_moving_mean, %stage2_unit2_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 128, 28, 28), float32], Tensor[(128), float32], Tensor[(128), float32]) */;
  %41 = %40.0;
  %42 = nn.relu(%41) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %43 = nn.conv2d(%42, %stage2_unit2_conv2_weight, padding=[1, 1, 1, 1], channels=128, kernel_size=[3, 3]) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %44 = add(%43, %35) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %45 = nn.batch_norm(%44, %stage3_unit1_bn1_gamma, %stage3_unit1_bn1_beta, %stage3_unit1_bn1_moving_mean, %stage3_unit1_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 128, 28, 28), float32], Tensor[(128), float32], Tensor[(128), float32]) */;
  %46 = %45.0;
  %47 = nn.relu(%46) /* ty=Tensor[(1, 128, 28, 28), float32] */;
  %48 = nn.conv2d(%47, %stage3_unit1_conv1_weight, strides=[2, 2], padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %49 = nn.batch_norm(%48, %stage3_unit1_bn2_gamma, %stage3_unit1_bn2_beta, %stage3_unit1_bn2_moving_mean, %stage3_unit1_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 256, 14, 14), float32], Tensor[(256), float32], Tensor[(256), float32]) */;
  %50 = %49.0;
  %51 = nn.relu(%50) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %52 = nn.conv2d(%51, %stage3_unit1_conv2_weight, padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %53 = nn.conv2d(%47, %stage3_unit1_sc_weight, strides=[2, 2], padding=[0, 0, 0, 0], channels=256, kernel_size=[1, 1]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %54 = add(%52, %53) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %55 = nn.batch_norm(%54, %stage3_unit2_bn1_gamma, %stage3_unit2_bn1_beta, %stage3_unit2_bn1_moving_mean, %stage3_unit2_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 256, 14, 14), float32], Tensor[(256), float32], Tensor[(256), float32]) */;
  %56 = %55.0;
  %57 = nn.relu(%56) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %58 = nn.conv2d(%57, %stage3_unit2_conv1_weight, padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %59 = nn.batch_norm(%58, %stage3_unit2_bn2_gamma, %stage3_unit2_bn2_beta, %stage3_unit2_bn2_moving_mean, %stage3_unit2_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 256, 14, 14), float32], Tensor[(256), float32], Tensor[(256), float32]) */;
  %60 = %59.0;
  %61 = nn.relu(%60) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %62 = nn.conv2d(%61, %stage3_unit2_conv2_weight, padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %63 = add(%62, %54) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %64 = nn.batch_norm(%63, %stage4_unit1_bn1_gamma, %stage4_unit1_bn1_beta, %stage4_unit1_bn1_moving_mean, %stage4_unit1_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 256, 14, 14), float32], Tensor[(256), float32], Tensor[(256), float32]) */;
  %65 = %64.0;
  %66 = nn.relu(%65) /* ty=Tensor[(1, 256, 14, 14), float32] */;
  %67 = nn.conv2d(%66, %stage4_unit1_conv1_weight, strides=[2, 2], padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %68 = nn.batch_norm(%67, %stage4_unit1_bn2_gamma, %stage4_unit1_bn2_beta, %stage4_unit1_bn2_moving_mean, %stage4_unit1_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 512, 7, 7), float32], Tensor[(512), float32], Tensor[(512), float32]) */;
  %69 = %68.0;
  %70 = nn.relu(%69) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %71 = nn.conv2d(%70, %stage4_unit1_conv2_weight, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %72 = nn.conv2d(%66, %stage4_unit1_sc_weight, strides=[2, 2], padding=[0, 0, 0, 0], channels=512, kernel_size=[1, 1]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %73 = add(%71, %72) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %74 = nn.batch_norm(%73, %stage4_unit2_bn1_gamma, %stage4_unit2_bn1_beta, %stage4_unit2_bn1_moving_mean, %stage4_unit2_bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 512, 7, 7), float32], Tensor[(512), float32], Tensor[(512), float32]) */;
  %75 = %74.0;
  %76 = nn.relu(%75) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %77 = nn.conv2d(%76, %stage4_unit2_conv1_weight, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %78 = nn.batch_norm(%77, %stage4_unit2_bn2_gamma, %stage4_unit2_bn2_beta, %stage4_unit2_bn2_moving_mean, %stage4_unit2_bn2_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 512, 7, 7), float32], Tensor[(512), float32], Tensor[(512), float32]) */;
  %79 = %78.0;
  %80 = nn.relu(%79) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %81 = nn.conv2d(%80, %stage4_unit2_conv2_weight, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %82 = add(%81, %73) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %83 = nn.batch_norm(%82, %bn1_gamma, %bn1_beta, %bn1_moving_mean, %bn1_moving_var, epsilon=2e-05f) /* ty=(Tensor[(1, 512, 7, 7), float32], Tensor[(512), float32], Tensor[(512), float32]) */;
  %84 = %83.0;
  %85 = nn.relu(%84) /* ty=Tensor[(1, 512, 7, 7), float32] */;
  %86 = nn.global_avg_pool2d(%85) /* ty=Tensor[(1, 512, 1, 1), float32] */;
  %87 = nn.batch_flatten(%86) /* ty=Tensor[(1, 512), float32] */;
  %88 = nn.dense(%87, %fc1_weight, units=1000) /* ty=Tensor[(1, 1000), float32] */;
  %89 = nn.bias_add(%88, %fc1_bias, axis=-1) /* ty=Tensor[(1, 1000), float32] */;
  nn.softmax(%89) /* ty=Tensor[(1, 1000), float32] */
}

opt_level :3
target: opencl -device=intel_graphics -model=unknown
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 3, 224, 224, 'float32'), (64, 3, 7, 7, 'float32'), (2, 2), (3, 3, 3, 3), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 64, 56, 56, 'float32'), (64, 64, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 64, 56, 56, 'float32'), (64, 64, 1, 1, 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 64, 56, 56, 'float32'), (128, 64, 3, 3, 'float32'), (2, 2), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 128, 28, 28, 'float32'), (128, 128, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 64, 56, 56, 'float32'), (128, 64, 1, 1, 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 128, 28, 28, 'float32'), (256, 128, 3, 3, 'float32'), (2, 2), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 256, 14, 14, 'float32'), (256, 256, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 128, 28, 28, 'float32'), (256, 128, 1, 1, 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 256, 14, 14, 'float32'), (512, 256, 3, 3, 'float32'), (2, 2), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 512, 7, 7, 'float32'), (512, 512, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('conv2d', (1, 256, 14, 14, 'float32'), (512, 256, 1, 1, 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -device=intel_graphics -model=unknown, workload=('dense', (1, 512, 'float32'), (1000, 512, 'float32'), 0, 'float32'). A fallback configuration is used, which may bring great performance regression.
[06:28:55] /home/user/tvm/tvm/src/runtime/opencl/opencl_device_api.cc:263: No OpenCL platform matched given existing options ...
[06:28:56] /home/user/tvm/tvm/src/runtime/opencl/opencl_device_api.cc:263: No OpenCL platform matched given existing options ...
Traceback (most recent call last):

  File "/mnt/d/Codes/tvm_testbed/tvm_test1.py", line 123, in <module>
    module = graph_runtime.create(graph, lib, ctx)

  File "/home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/contrib/graph_runtime.py", line 59, in create
    return GraphModule(fcreate(graph_json_str, libmod, *device_type_id))

  File "tvm/_ffi/_cython/./packed_func.pxi", line 308, in tvm._ffi._cy3.core.PackedFuncBase.__call__

  File "tvm/_ffi/_cython/./packed_func.pxi", line 253, in tvm._ffi._cy3.core.FuncCall

  File "tvm/_ffi/_cython/./base.pxi", line 159, in tvm._ffi._cy3.core.CALL

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (7) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(TVMFuncCall+0x65) [0x7fd851f09525]
  [bt] (6) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(+0x13d4fa4) [0x7fd851f88fa4]
  [bt] (5) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(+0x13d4e44) [0x7fd851f88e44]
  [bt] (4) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::GraphRuntimeCreate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::Module const&, std::vector<DLContext, std::allocator<DLContext> > const&)+0xd3) [0x7fd851f88c43]
  [bt] (3) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::GraphRuntime::Init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::Module, std::vector<DLContext, std::allocator<DLContext> > const&)+0x239) [0x7fd851f888b9]
  [bt] (2) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::GraphRuntime::SetupStorage()+0x51d) [0x7fd851f86fad]
  [bt] (1) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::NDArray::Empty(std::vector<long, std::allocator<long> >, DLDataType, DLContext)+0x236) [0x7fd851f2e006]
  [bt] (0) /home/user/.local/lib/python3.7/site-packages/tvm-0.7.dev0-py3.7-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::cl::OpenCLWorkspace::AllocDataSpace(DLContext, unsigned long, unsigned long, DLDataType)+0x6e6) [0x7fd851f98cf6]
  File "/home/user/tvm/tvm/src/runtime/opencl/opencl_device_api.cc", line 123
TVMError: Check failed: context != nullptr: No OpenCL device

Thanks a lot for your kind help, its really greatly appreciated

I also tried the example you mentioned, only to get these errors :

(base) user@F3S:/mnt/d/Codes/tvm_testbed$ env PTVSD_LAUNCHER_PORT=65275 /home/user/anaconda3/bin/python /home/user/.vscode-server/extensions/ms-python.python-2020.2.62710/pythonFiles/lib/python/new_ptvsd/wheels/ptvsd/launcher /mnt/d/Codes/tvm_testbed/deploy_ssd_gluoncv.py 
Downloading from url https://github.com/dmlc/web-data/blob/master/gluoncv/detection/street_small.jpg?raw=true to /home/user/.tvm_test_data/data/street_small.jpg
...100%, 0.12 MB, 34 KB/s, 3 seconds passed
/home/user/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py:1159: UserWarning: Cannot decide type for the following arguments. Consider providing them as input:
        data: None
  input_sym_arg_type = in_param.infer_type()[0]
Downloading /home/user/.mxnet/models/ssd_512_resnet50_v1_voc-9c8b225a.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/ssd_512_resnet50_v1_voc-9c8b225a.zip...
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 132723/132723 [00:18<00:00, 7372.02KB/s]
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 3, 512, 512, 'float32'), (64, 3, 7, 7, 'float32'), (2, 2), (3, 3, 3, 3), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 64, 128, 128, 'float32'), (64, 64, 1, 1, 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 64, 128, 128, 'float32'), (64, 64, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 64, 128, 128, 'float32'), (256, 64, 1, 1, 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 256, 128, 128, 'float32'), (64, 256, 1, 1, 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 256, 128, 128, 'float32'), (128, 256, 1, 1, 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 128, 64, 64, 'float32'), (128, 128, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 128, 64, 64, 'float32'), (512, 128, 1, 1, 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 256, 128, 128, 'float32'), (512, 256, 1, 1, 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 512, 64, 64, 'float32'), (128, 512, 1, 1, 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 512, 64, 64, 'float32'), (256, 512, 1, 1, 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 256, 32, 32, 'float32'), (256, 256, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 256, 32, 32, 'float32'), (1024, 256, 1, 1, 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 512, 64, 64, 'float32'), (1024, 512, 1, 1, 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 1024, 32, 32, 'float32'), (256, 1024, 1, 1, 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 1024, 32, 32, 'float32'), (84, 1024, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 1024, 32, 32, 'float32'), (512, 1024, 1, 1, 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 512, 16, 16, 'float32'), (512, 512, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 512, 16, 16, 'float32'), (2048, 512, 1, 1, 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 1024, 32, 32, 'float32'), (2048, 1024, 1, 1, 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 2048, 16, 16, 'float32'), (512, 2048, 1, 1, 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 2048, 16, 16, 'float32'), (126, 2048, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 512, 16, 16, 'float32'), (512, 512, 3, 3, 'float32'), (2, 2), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 512, 8, 8, 'float32'), (126, 512, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 512, 8, 8, 'float32'), (512, 512, 1, 1, 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 512, 8, 8, 'float32'), (512, 512, 3, 3, 'float32'), (2, 2), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 512, 4, 4, 'float32'), (126, 512, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 512, 4, 4, 'float32'), (256, 512, 1, 1, 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 256, 4, 4, 'float32'), (256, 256, 3, 3, 'float32'), (2, 2), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 256, 2, 2, 'float32'), (84, 256, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 256, 2, 2, 'float32'), (256, 256, 1, 1, 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 256, 2, 2, 'float32'), (256, 256, 3, 3, 'float32'), (2, 2), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 256, 1, 1, 'float32'), (84, 256, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 1024, 32, 32, 'float32'), (16, 1024, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 2048, 16, 16, 'float32'), (24, 2048, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 512, 8, 8, 'float32'), (24, 512, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 512, 4, 4, 'float32'), (24, 512, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 256, 2, 2, 'float32'), (16, 256, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=('conv2d', (1, 256, 1, 1, 'float32'), (16, 256, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
[06:26:10] /home/user/tvm/tvm/src/te/schedule/bound.cc:127: not in feed graph consumer = hybrid(hybrid_nms, 0x7fc970704d90)
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-user'
: CommandLine Error: Option 'help-list' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options

It seems there’s something wrong with you intel graphics compilation setup. I’m not quite familiar with that. But maybe you can try to run some sample opencl code Intel officially provided. And when all that works, you can come back and try to run tvm code on intel graphics.

1 Like

Also there’re some related posts here: LLVM error "option ... registered more than once" while loading libtvm.so regarding your error.

1 Like

Thank you very much I really appreciate it. I’ll look into this and update you with the results.