Error Quantizing a Tensorflow model (Module vs Function changes?)


#1

Hi,

I am trying to quantize a tensorflow model using a previously working code. However, after pulling the newest changes I am getting the following error:

  File "/home/tvm/tvm-framework/vru/vru_one_look.py", line 386, in tvm
    qgraph = relay.quantize.quantize(mod, params)
  File "/home/tvm/tvm/python/tvm/relay/quantize/quantize.py", line 340, in quantize
    graph = _bind_params(graph, params)
  File "/home/tvm/tvm/python/tvm/relay/quantize/quantize.py", line 299, in _bind_params
    for arg in func.params:
  File "tvm/_ffi/_cython/./node.pxi", line 81, in tvm._ffi._cy3.core.NodeBase.__getattr__
AttributeError: '<class 'tvm.relay.module.Module'>' object has no attribute 'params'

Is this somehow related to some changes on the module vs function?. I saw that the tensorflow tutorial was changed from

graph, lib, params = relay.build(mod[mod.entry_func],, target=target, params=params)

to

graph, lib, params = relay.build(mod, target=target, params=params)

I wonder if this is related.

@vinx13 any ideas?

Thanks


#2

+1
I am also experiencing this issue with the ONNX and MXNET relay frontends.

Thanks,
Ben


#3

Yes there are some recent changes on api:

mod['main'] = relay.quantize.quantize(mod['main'], params)
graph, lib, params = relay.build(mod, target=target)

#4

We should also change the quantization api to be consistent with the rest(take mod as input)


#6

It would be also nice if someone can add a quantization tutorial, since I believe a bit more documentation in this area is required


#7

@vinx13 Hi, I followed the way you suggest on mxnet frontend, but get a long error message:
(BTW, I successfully run tvm-cuda-int8-benchmark, and found that from_mxnet outputs <class 'tvm.relay.expr.Function'> in the example, while outputs <class 'tvm.relay.module.Module'> in my usecase for building gluon model, which is weird for not being consistent with tvm/python/relay/frontend/mxnet.py from_mxnet func’s comment)

Traceback (most recent call last):
  File "test_tvm_gen.py", line 22, in <module>
    main()
  File "test_tvm_gen.py", line 17, in main
    model = TVMOptimize(getattr(sample_config, model_name))
  File "tvm_gen.py", line 45, in __init__
    self.compile()
  File "tvm_gen.py", line 82, in compile
    self.net['main'] = relay.quantize.quantize(self.net['main'], params=self.params)
  File "tvm/python/tvm/relay/quantize/quantize.py", line 366, in quantize
    mod = quantize_seq(mod)
  File "tvm/python/tvm/relay/transform.py", line 185, in __call__
    return _transform.RunPass(self, mod)
  File "tvm/python/tvm/_ffi/_ctypes/function.py", line 210, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) tvm/lib/Release/libtvm.so(+0xa162f4) [0x7f27195bc2f4]
  [bt] (7) tvm/lib/Release/libtvm.so(+0xa161ac) [0x7f27195bc1ac]
  [bt] (6) tvm/lib/Release/libtvm.so(tvm::relay::transform::SequentialNode::operator()(tvm::relay::Module const&, tvm::relay::transform::PassContext const&) const+0x3a1) [0x7f27195bbd21]
  [bt] (5) tvm/lib/Release/libtvm.so(tvm::relay::transform::FunctionPassNode::operator()(tvm::relay::Module const&, tvm::relay::transform::PassContext const&) const+0x2f8) [0x7f27195ba528]
  [bt] (4) tvm/lib/Release/libtvm.so(tvm::relay::ModuleNode::Add(tvm::relay::GlobalVar const&, tvm::relay::Function const&, bool)+0x581) [0x7f27193abe51]
  [bt] (3) tvm/lib/Release/libtvm.so(tvm::relay::InferType(tvm::relay::Function const&, tvm::relay::Module const&, tvm::relay::GlobalVar const&)+0x361) [0x7f27195f2e81]
  [bt] (2) tvm/lib/Release/libtvm.so(tvm::relay::TypeInferencer::Infer(tvm::relay::Expr)+0x71) [0x7f27195f1fb1]
  [bt] (1) tvm/lib/Release/libtvm.so(tvm::relay::ErrorReporter::RenderErrors(tvm::relay::Module const&, bool)+0x12fb) [0x7f271937aeab]
  [bt] (0) tvm/lib/Release/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x43) [0x7f2718f4d2e3]
  [bt] (8) tvm/lib/Release/libtvm.so(+0xa161ac) [0x7f27195bc1ac]
  [bt] (7) tvm/lib/Release/libtvm.so(tvm::relay::transform::SequentialNode::operator()(tvm::relay::Module const&, tvm::relay::transform::PassContext const&) const+0x3a1) [0x7f27195bbd21]
  [bt] (6) tvm/lib/Release/libtvm.so(tvm::relay::transform::FunctionPassNode::operator()(tvm::relay::Module const&, tvm::relay::transform::PassContext const&) const+0x2f8) [0x7f27195ba528]
  [bt] (5) tvm/lib/Release/libtvm.so(tvm::relay::ModuleNode::Add(tvm::relay::GlobalVar const&, tvm::relay::Function const&, bool)+0x581) [0x7f27193abe51]
  [bt] (4) tvm/lib/Release/libtvm.so(tvm::relay::InferType(tvm::relay::Function const&, tvm::relay::Module const&, tvm::relay::GlobalVar const&)+0x361) [0x7f27195f2e81]
  [bt] (3) tvm/lib/Release/libtvm.so(tvm::relay::TypeInferencer::Infer(tvm::relay::Expr)+0x55) [0x7f27195f1f95]
  [bt] (2) tvm/lib/Release/libtvm.so(tvm::relay::TypeSolver::Solve()+0xc0e) [0x7f271960964e]
  [bt] (1) tvm/lib/Release/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::TypedPackedFunc<bool (tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&)>::AssignTypedLambda<bool (*)(tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&)>(bool (*)(tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0xd6) [0x7f27193eeca6]
  [bt] (0) tvm/lib/Release/libtvm.so(tvm::relay::BroadcastRel(tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&)+0x44d) [0x7f2719508abd]
  File "tvm/src/relay/ir/error.cc", line 133
TVMError: 
Error(s) have occurred. The program has been annotated with them:

In `main`: 
v0.0.3
fn (%data: Tensor[(1, 3, 224, 224), float32]) -> Tensor[(1, 1000), float32] {
  %0 = multiply(%data, 16f);
  %1 = round(%0);
  %2 = clip(%1, a_min=-127, a_max=127);
  %3 = cast(%2, dtype="int8");
  %4 = multiply(meta[relay.Constant][0], 512f);
  %5 = round(%4);
  %6 = clip(%5, a_min=-127, a_max=127);
  %7 = cast(%6, dtype="int8");
  %8 = nn.conv2d(%3, %7, strides=[2, 2], padding=[3, 3], channels=64, kernel_size=[7, 7], out_dtype="int32");
  %9 = multiply(meta[relay.Constant][1], 128f);
  %10 = round(%9);
  %11 = clip(%10, a_min=-127, a_max=127);
  %12 = cast(%11, dtype="int32");
  %13 = left_shift(%12, 6);
  %14 = add(%8, %13);
  %15 = nn.relu(%14);
  %16 = add(%15, 256);
  %17 = right_shift(%16, 9);
  %18 = clip(%17, a_min=-127, a_max=127);
  %19 = cast(%18, dtype="int8");
  %20 = nn.max_pool2d(%19, pool_size=[3, 3], strides=[2, 2], padding=[1, 1]);
  %21 = clip(%20, a_min=-127, a_max=127);
  %22 = multiply(meta[relay.Constant][2], 128f);
  %23 = round(%22);
  %24 = clip(%23, a_min=-127, a_max=127);
  %25 = cast(%24, dtype="int8");
  %26 = nn.conv2d(%21, %25, padding=[1, 1], channels=64, kernel_size=[3, 3], out_dtype="int32");
  %27 = multiply(meta[relay.Constant][3], 64f);
  %28 = round(%27);
  %29 = clip(%28, a_min=-127, a_max=127);
  %30 = cast(%29, dtype="int32");
  %31 = left_shift(%30, 5);
  %32 = add(%26, %31);
  %33 = nn.relu(%32);
  %34 = add(%33, 64);
  %35 = right_shift(%34, 7);
  %36 = clip(%35, a_min=-127, a_max=127);
  %37 = cast(%36, dtype="int8");
  %38 = annotation.stop_fusion(%37);
  %39 = multiply(meta[relay.Constant][4], 64f);
  %40 = round(%39);
  %41 = clip(%40, a_min=-127, a_max=127);
  %42 = cast(%41, dtype="int8");
  %43 = nn.conv2d(%38, %42, padding=[1, 1], channels=64, kernel_size=[3, 3], out_dtype="int32");
  %44 = multiply(meta[relay.Constant][5], 64f);
  %45 = round(%44);
  %46 = clip(%45, a_min=-127, a_max=127);
  %47 = cast(%46, dtype="int32");
  %48 = left_shift(%47, 4);
  %49 = add(%43, %48);
  %50 = cast(%49, dtype="int8");
  %51 = cast(%21, dtype="int8");
  %52 = annotation.stop_fusion(%51);
  %53 = cast(%52, dtype="int8");
  %54 = left_shift(%53, 6) an internal invariant was violated while typechecking your program [14:58:51] /home/chongzhao/Flexiv/flexiv_3rdparty/3rdparty/tvm/src/relay/op/type_relations.cc:120: Check failed: t0->dtype == t1->dtype (int8 vs. int32) : 
; ;
  %55 = add(%50, %54);
  %56 = nn.relu(%55);
  %57 = add(%56, 32);
  %58 = right_shift(%57, 6);
  %59 = clip(%58, a_min=-127, a_max=127);
  %60 = cast(%59, dtype="int8");
  %61 = annotation.stop_fusion(%60);
  %62 = multiply(meta[relay.Constant][6], 256f);
  %63 = round(%62);
  %64 = clip(%63, a_min=-127, a_max=127);
  %65 = cast(%64, dtype="int8");
  %66 = nn.conv2d(%61, %65, padding=[1, 1], channels=64, kernel_size=[3, 3], out_dtype="int32");
  %67 = multiply(meta[relay.Constant][7], 64f);
  %68 = round(%67);
  %69 = clip(%68, a_min=-127, a_max=127);
  %70 = cast(%69, dtype="int32");
  %71 = left_shift(%70, 6);
  %72 = add(%66, %71);
  %73 = nn.relu(%72);
  %74 = add(%73, 128);
  %75 = right_shift(%74, 8);
  %76 = clip(%75, a_min=-127, a_max=127);
  %77 = cast(%76, dtype="int8");
  %78 = annotation.stop_fusion(%77);
  %79 = multiply(meta[relay.Constant][8], 64f);
  %80 = round(%79);
  %81 = clip(%80, a_min=-127, a_max=127);
  %82 = cast(%81, dtype="int8");
  %83 = nn.conv2d(%78, %82, padding=[1, 1], channels=64, kernel_size=[3, 3], out_dtype="int32");
  %84 = multiply(meta[relay.Constant][9], 128f);
  %85 = round(%84);
  %86 = clip(%85, a_min=-127, a_max=127);
  %87 = cast(%86, dtype="int32");
  %88 = left_shift(%87, 3);
  %89 = add(%83, %88);
  %90 = add(%89, 32);
  %91 = right_shift(%90, 6);
  %92 = clip(%91, a_min=-127, a_max=127);
  %93 = cast(%92, dtype="int8");
  %94 = annotation.stop_fusion(%93);
  %95 = cast(%59, dtype="int8");
  %96 = annotation.stop_fusion(%95);
  %97 = add(%94, %96);
  %98 = nn.relu(%97);
  %99 = clip(%98, a_min=-127, a_max=127);
  %100 = multiply(meta[relay.Constant][10], 512f);
  %101 = round(%100);
  %102 = clip(%101, a_min=-127, a_max=127);
  %103 = cast(%102, dtype="int8");
  %104 = nn.conv2d(%99, %103, strides=[2, 2], padding=[1, 1], channels=128, kernel_size=[3, 3], out_dtype="int32");
  %105 = multiply(meta[relay.Constant][11], 128f);
  %106 = round(%105);
  %107 = clip(%106, a_min=-127, a_max=127);
  %108 = cast(%107, dtype="int32");
  %109 = left_shift(%108, 6);
  %110 = add(%104, %109);
  %111 = nn.relu(%110);
  %112 = add(%111, 256);
  %113 = right_shift(%112, 9);
  %114 = clip(%113, a_min=-127, a_max=127);
  %115 = cast(%114, dtype="int8");
  %116 = annotation.stop_fusion(%115);
  %117 = multiply(meta[relay.Constant][12], 128f);
  %118 = round(%117);
  %119 = clip(%118, a_min=-127, a_max=127);
  %120 = cast(%119, dtype="int8");
  %121 = nn.conv2d(%116, %120, padding=[1, 1], channels=128, kernel_size=[3, 3], out_dtype="int32");
  %122 = multiply(meta[relay.Constant][13], 128f);
  %123 = round(%122);
  %124 = clip(%123, a_min=-127, a_max=127);
  %125 = cast(%124, dtype="int32");
  %126 = left_shift(%125, 4);
  %127 = add(%121, %126);
  %128 = add(%127, 64);
  %129 = right_shift(%128, 7);
  %130 = clip(%129, a_min=-127, a_max=127);
  %131 = cast(%130, dtype="int8");
  %132 = annotation.stop_fusion(%131);
  %133 = multiply(meta[relay.Constant][14], 128f);
  %134 = round(%133);
  %135 = clip(%134, a_min=-127, a_max=127);
  %136 = cast(%135, dtype="int8");
  %137 = nn.conv2d(%99, %136, strides=[2, 2], channels=128, kernel_size=[1, 1], out_dtype="int32");
  %138 = multiply(meta[relay.Constant][15], 128f);
  %139 = round(%138);
  %140 = clip(%139, a_min=-127, a_max=127);
  %141 = cast(%140, dtype="int32");
  %142 = left_shift(%141, 4);
  %143 = add(%137, %142);
  %144 = add(%143, 64);
  %145 = right_shift(%144, 7);
  %146 = clip(%145, a_min=-127, a_max=127);
  %147 = cast(%146, dtype="int8");
  %148 = annotation.stop_fusion(%147);
  %149 = add(%132, %148);
  %150 = nn.relu(%149);
  %151 = clip(%150, a_min=-127, a_max=127);
  %152 = multiply(meta[relay.Constant][16], 128f);
  %153 = round(%152);
  %154 = clip(%153, a_min=-127, a_max=127);
  %155 = cast(%154, dtype="int8");
  %156 = nn.conv2d(%151, %155, padding=[1, 1], channels=128, kernel_size=[3, 3], out_dtype="int32");
  %157 = multiply(meta[relay.Constant][17], 128f);
  %158 = round(%157);
  %159 = clip(%158, a_min=-127, a_max=127);
  %160 = cast(%159, dtype="int32");
  %161 = left_shift(%160, 4);
  %162 = add(%156, %161);
  %163 = nn.relu(%162);
  %164 = add(%163, 64);
  %165 = right_shift(%164, 7);
  %166 = clip(%165, a_min=-127, a_max=127);
  %167 = cast(%166, dtype="int8");
  %168 = annotation.stop_fusion(%167);
  %169 = multiply(meta[relay.Constant][18], 64f);
  %170 = round(%169);
  %171 = clip(%170, a_min=-127, a_max=127);
  %172 = cast(%171, dtype="int8");
  %173 = nn.conv2d(%168, %172, padding=[1, 1], channels=128, kernel_size=[3, 3], out_dtype="int32");
  %174 = multiply(meta[relay.Constant][19], 64f);
  %175 = round(%174);
  %176 = clip(%175, a_min=-127, a_max=127);
  %177 = cast(%176, dtype="int32");
  %178 = left_shift(%177, 4);
  %179 = add(%173, %178);
  %180 = cast(%179, dtype="int8");
  %181 = cast(%151, dtype="int8");
  %182 = annotation.stop_fusion(%181);
  %183 = cast(%182, dtype="int8");
  %184 = left_shift(%183, 6);
  %185 = add(%180, %184);
  %186 = nn.relu(%185);
  %187 = add(%186, 32);
  %188 = right_shift(%187, 6);
  %189 = clip(%188, a_min=-127, a_max=127);
  %190 = cast(%189, dtype="int8");
  %191 = annotation.stop_fusion(%190);
  %192 = multiply(meta[relay.Constant][20], 256f);
  %193 = round(%192);
  %194 = clip(%193, a_min=-127, a_max=127);
  %195 = cast(%194, dtype="int8");
  %196 = nn.conv2d(%191, %195, strides=[2, 2], padding=[1, 1], channels=256, kernel_size=[3, 3], out_dtype="int32");
  %197 = multiply(meta[relay.Constant][21], 128f);
  %198 = round(%197);
  %199 = clip(%198, a_min=-127, a_max=127);
  %200 = cast(%199, dtype="int32");
  %201 = left_shift(%200, 5);
  %202 = add(%196, %201);
  %203 = nn.relu(%202);
  %204 = add(%203, 128);
  %205 = right_shift(%204, 8);
  %206 = clip(%205, a_min=-127, a_max=127);
  %207 = cast(%206, dtype="int8");
  %208 = annotation.stop_fusion(%207);
  %209 = multiply(meta[relay.Constant][22], 128f);
  %210 = round(%209);
  %211 = clip(%210, a_min=-127, a_max=127);
  %212 = cast(%211, dtype="int8");
  %213 = nn.conv2d(%208, %212, padding=[1, 1], channels=256, kernel_size=[3, 3], out_dtype="int32");
  %214 = multiply(meta[relay.Constant][23], 128f);
  %215 = round(%214);
  %216 = clip(%215, a_min=-127, a_max=127);
  %217 = cast(%216, dtype="int32");
  %218 = left_shift(%217, 4);
  %219 = add(%213, %218);
  %220 = add(%219, 64);
  %221 = right_shift(%220, 7);
  %222 = clip(%221, a_min=-127, a_max=127);
  %223 = cast(%222, dtype="int8");
  %224 = annotation.stop_fusion(%223);
  %225 = cast(%189, dtype="int8");
  %226 = annotation.stop_fusion(%225);
  %227 = multiply(meta[relay.Constant][24], 256f);
  %228 = round(%227);
  %229 = clip(%228, a_min=-127, a_max=127);
  %230 = cast(%229, dtype="int8");
  %231 = nn.conv2d(%226, %230, strides=[2, 2], channels=256, kernel_size=[1, 1], out_dtype="int32");
  %232 = multiply(meta[relay.Constant][25], 128f);
  %233 = round(%232);
  %234 = clip(%233, a_min=-127, a_max=127);
  %235 = cast(%234, dtype="int32");
  %236 = left_shift(%235, 5);
  %237 = add(%231, %236);
  %238 = add(%237, 128);
  %239 = right_shift(%238, 8);
  %240 = clip(%239, a_min=-127, a_max=127);
  %241 = cast(%240, dtype="int8");
  %242 = annotation.stop_fusion(%241);
  %243 = add(%224, %242);
  %244 = nn.relu(%243);
  %245 = clip(%244, a_min=-127, a_max=127);
  %246 = multiply(meta[relay.Constant][26], 256f);
  %247 = round(%246);
  %248 = clip(%247, a_min=-127, a_max=127);
  %249 = cast(%248, dtype="int8");
  %250 = nn.conv2d(%245, %249, padding=[1, 1], channels=256, kernel_size=[3, 3], out_dtype="int32");
  %251 = multiply(meta[relay.Constant][27], 128f);
  %252 = round(%251);
  %253 = clip(%252, a_min=-127, a_max=127);
  %254 = cast(%253, dtype="int32");
  %255 = left_shift(%254, 5);
  %256 = add(%250, %255);
  %257 = nn.relu(%256);
  %258 = add(%257, 128);
  %259 = right_shift(%258, 8);
  %260 = clip(%259, a_min=-127, a_max=127);
  %261 = cast(%260, dtype="int8");
  %262 = annotation.stop_fusion(%261);
  %263 = multiply(meta[relay.Constant][28], 128f);
  %264 = round(%263);
  %265 = clip(%264, a_min=-127, a_max=127);
  %266 = cast(%265, dtype="int8");
  %267 = nn.conv2d(%262, %266, padding=[1, 1], channels=256, kernel_size=[3, 3], out_dtype="int32");
  %268 = multiply(meta[relay.Constant][29], 64f);
  %269 = round(%268);
  %270 = clip(%269, a_min=-127, a_max=127);
  %271 = cast(%270, dtype="int32");
  %272 = left_shift(%271, 5);
  %273 = add(%267, %272);
  %274 = cast(%273, dtype="int8");
  %275 = cast(%245, dtype="int8");
  %276 = annotation.stop_fusion(%275);
  %277 = cast(%276, dtype="int8");
  %278 = left_shift(%277, 7);
  %279 = add(%274, %278);
  %280 = nn.relu(%279);
  %281 = add(%280, 64);
  %282 = right_shift(%281, 7);
  %283 = clip(%282, a_min=-127, a_max=127);
  %284 = cast(%283, dtype="int8");
  %285 = annotation.stop_fusion(%284);
  %286 = multiply(meta[relay.Constant][30], 256f);
  %287 = round(%286);
  %288 = clip(%287, a_min=-127, a_max=127);
  %289 = cast(%288, dtype="int8");
  %290 = nn.conv2d(%285, %289, strides=[2, 2], padding=[1, 1], channels=512, kernel_size=[3, 3], out_dtype="int32");
  %291 = multiply(meta[relay.Constant][31], 256f);
  %292 = round(%291);
  %293 = clip(%292, a_min=-127, a_max=127);
  %294 = cast(%293, dtype="int32");
  %295 = left_shift(%294, 4);
  %296 = add(%290, %295);
  %297 = nn.relu(%296);
  %298 = add(%297, 128);
  %299 = right_shift(%298, 8);
  %300 = clip(%299, a_min=-127, a_max=127);
  %301 = cast(%300, dtype="int8");
  %302 = annotation.stop_fusion(%301);
  %303 = multiply(meta[relay.Constant][32], 64f);
  %304 = round(%303);
  %305 = clip(%304, a_min=-127, a_max=127);
  %306 = cast(%305, dtype="int8");
  %307 = nn.conv2d(%302, %306, padding=[1, 1], channels=512, kernel_size=[3, 3], out_dtype="int32");
  %308 = multiply(meta[relay.Constant][33], 64f);
  %309 = round(%308);
  %310 = clip(%309, a_min=-127, a_max=127);
  %311 = cast(%310, dtype="int32");
  %312 = left_shift(%311, 4);
  %313 = add(%307, %312);
  %314 = add(%313, 32);
  %315 = right_shift(%314, 6);
  %316 = clip(%315, a_min=-127, a_max=127);
  %317 = cast(%316, dtype="int8");
  %318 = annotation.stop_fusion(%317);
  %319 = cast(%283, dtype="int8");
  %320 = annotation.stop_fusion(%319);
  %321 = multiply(meta[relay.Constant][34], 64f);
  %322 = round(%321);
  %323 = clip(%322, a_min=-127, a_max=127);
  %324 = cast(%323, dtype="int8");
  %325 = nn.conv2d(%320, %324, strides=[2, 2], channels=512, kernel_size=[1, 1], out_dtype="int32");
  %326 = multiply(meta[relay.Constant][35], 128f);
  %327 = round(%326);
  %328 = clip(%327, a_min=-127, a_max=127);
  %329 = cast(%328, dtype="int32");
  %330 = left_shift(%329, 3);
  %331 = add(%325, %330);
  %332 = add(%331, 32);
  %333 = right_shift(%332, 6);
  %334 = clip(%333, a_min=-127, a_max=127);
  %335 = cast(%334, dtype="int8");
  %336 = annotation.stop_fusion(%335);
  %337 = add(%318, %336);
  %338 = nn.relu(%337);
  %339 = clip(%338, a_min=-127, a_max=127);
  %340 = multiply(meta[relay.Constant][36], 256f);
  %341 = round(%340);
  %342 = clip(%341, a_min=-127, a_max=127);
  %343 = cast(%342, dtype="int8");
  %344 = nn.conv2d(%339, %343, padding=[1, 1], channels=512, kernel_size=[3, 3], out_dtype="int32");
  %345 = multiply(meta[relay.Constant][37], 256f);
  %346 = round(%345);
  %347 = clip(%346, a_min=-127, a_max=127);
  %348 = cast(%347, dtype="int32");
  %349 = left_shift(%348, 4);
  %350 = add(%344, %349);
  %351 = nn.relu(%350);
  %352 = add(%351, 128);
  %353 = right_shift(%352, 8);
  %354 = clip(%353, a_min=-127, a_max=127);
  %355 = cast(%354, dtype="int8");
  %356 = annotation.stop_fusion(%355);
  %357 = multiply(meta[relay.Constant][38], 16f);
  %358 = round(%357);
  %359 = clip(%358, a_min=-127, a_max=127);
  %360 = cast(%359, dtype="int8");
  %361 = nn.conv2d(%356, %360, padding=[1, 1], channels=512, kernel_size=[3, 3], out_dtype="int32");
  %362 = multiply(meta[relay.Constant][39], 16f);
  %363 = round(%362);
  %364 = clip(%363, a_min=-127, a_max=127);
  %365 = cast(%364, dtype="int32");
  %366 = left_shift(%365, 4);
  %367 = add(%361, %366);
  %368 = cast(%367, dtype="int8");
  %369 = cast(%339, dtype="int8");
  %370 = annotation.stop_fusion(%369);
  %371 = cast(%370, dtype="int8");
  %372 = left_shift(%371, 4);
  %373 = add(%368, %372);
  %374 = nn.relu(%373);
  %375 = add(%374, 8);
  %376 = right_shift(%375, 4);
  %377 = clip(%376, a_min=-127, a_max=127);
  %378 = cast(%377, dtype="int8");
  %379 = annotation.stop_fusion(%378);
  %380 = clip(%379, a_min=-127, a_max=127);
  %381 = cast(%380, dtype="int8");
  %382 = cast(%381, dtype="float32");
  %383 = multiply(%382, 0.0625f);
  %384 = nn.global_avg_pool2d(%383);
  %385 = nn.batch_flatten(%384);
  %386 = nn.batch_flatten(%385);
  %387 = nn.dense(%386, meta[relay.Constant][40], units=1000);
  add(%387, meta[relay.Constant][41])
}
// meta data omitted. you can use show_meta_data=True to include meta data

Thanks a lot!!!


#8

try store_lowbit_output=False in qconfig, likely there are some bugs in annotation.
The example in repo tvm-cuda-int8-benchmark is for older version tvm, it need to be updated to use module api