[Relay] nn.batch_norm fails with float16

cbalint13 · May 29, 2019, 7:06am

Following simple example fails in “float16” mode, works fine with “float32”.

import tvm
import numpy as np
from tvm import relay
from tvm.relay import testing

dtype="float16"

data = relay.var("data", relay.TensorType((1, 3, 224, 224), dtype))
weight = relay.var("weight")
bn_gamma = relay.var("bn_gamma")
bn_beta = relay.var("bn_beta")
bn_mmean = relay.var("bn_mean")
bn_mvar = relay.var("bn_var")

simple_net = relay.nn.conv2d(data=data, weight=weight, kernel_size=(3,3), channels=16, padding=(1, 1))
simple_net = relay.nn.batch_norm(simple_net, bn_gamma, bn_beta, bn_mmean, bn_mvar)[0]
simple_net = relay.nn.relu(simple_net)
simple_net = relay.Function(relay.ir_pass.free_vars(simple_net), simple_net)

data_shape = (1, 3, 224, 224)
net, params = testing.create_workload(simple_net)

print (net.astext())

target = "cuda"
graph, lib, params = relay.build_module.build(net, target, params=params)

The error is

In `main`: 
v0.0.1
fn () {
  add(meta[relay.Constant][0], 1e-05f)an internal invariant was violated while typechecking your program [02:07:15] /home/cbalint/rpmbuild/BUILD/tvm/src/relay/op/type_relations.cc:115: Check failed: t0->dtype == t1->dtype (float16 vs. float32) : 
;

Interesting is that e.g. epsilon is unpacked as 10e-5 (the minimum value in float16) instead of 10e-6 (as it is declared by default in float32). Some float16 awareness is there. Not sure how to fix or workaround the issue of this constants, and why constant wants to stay still in “float32”.
Happens with latest master (as of 20-May-2019) .

Would like to fix the issue, in meanwhile, does anyone have a good start point hint to the issue ?

janimesh · May 29, 2019, 7:48am

There seems to be a bug in simplify_inference.
Change https://github.com/dmlc/tvm/blob/master/src/relay/pass/simplify_inference.cc#L40 and https://github.com/dmlc/tvm/blob/master/src/relay/pass/simplify_inference.cc#L43

Replace Float(32) with ttype->dtype.

cbalint13 · May 30, 2019, 11:43am

@janimesh,

Thank you very much for the hint ! In the end, prepared PR 3260 to fix the issue.