Gluon model with prelu crash in relay.build

yangfly · April 1, 2019, 1:55pm

Gluon model with prelu was built successfully by nnvm.compiler but crashed when I tried relay.build. When I deleted all prelu or replace them with relu, everything went fine.

Environment

system: Ubuntu 16.04
python: 3.6
tvm: 0.6.dev

Crash Infomation

[2019-04-01 21:50:25] Cannot find config for target=llvm -mcpu=core-avx2, workload=('conv2d', (1, 3, 224, 224, 'float32'), (10, 3, 3, 3, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
[2019-04-01 21:50:25] Cannot find config for target=llvm -mcpu=core-avx2, workload=('conv2d', (1, 10, 111, 111, 'float32'), (16, 10, 3, 3, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
[2019-04-01 21:50:25] Cannot find config for target=llvm -mcpu=core-avx2, workload=('conv2d', (1, 16, 109, 109, 'float32'), (32, 16, 3, 3, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
[2019-04-01 21:50:25] Cannot find config for target=llvm -mcpu=core-avx2, workload=('conv2d', (1, 32, 107, 107, 'float32'), (6, 32, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Traceback (most recent call last):
  File "tools/tvm/speed.py", line 108, in <module>
    speed()
  File "tools/tvm/speed.py", line 71, in speed
    graph, lib, params = relay.build(func, 'llvm -mcpu=core-avx2', params=params)
  File "/home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/relay/build_module.py", line 276, in build
    func = optimize(func, target, params)
  File "/home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/relay/build_module.py", line 209, in optimize
    func = ir_pass.fold_constant(func)
  File "/home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/relay/ir_pass.py", line 687, in fold_constant
    return _ir_pass.FoldConstant(expr)
  File "tvm/_ffi/_cython/./function.pxi", line 287, in tvm._ffi._cy3.core.FunctionBase.__call__
  File "tvm/_ffi/_cython/./function.pxi", line 222, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./function.pxi", line 211, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 151, in tvm._ffi._cy3.core.CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) /home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0xd57fde) [0x7f8533080fde]
  [bt] (7) /home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0xd56a6b) [0x7f853307fa6b]
  [bt] (6) /home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::InferType(tvm::relay::Expr const&, tvm::relay::Module const&)+0x3fd) [0x7f85330dd37d]
  [bt] (5) /home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0xbdc316) [0x7f8532f05316]
  [bt] (4) /home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0xbdb890) [0x7f8532f04890]
  [bt] (3) /home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::InferType(tvm::relay::Function const&, tvm::relay::Module const&, tvm::relay::GlobalVar const&)+0x325) [0x7f85330dda35]
  [bt] (2) /home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0xdb3f2a) [0x7f85330dcf2a]
  [bt] (1) /home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0xbaf473) [0x7f8532ed8473]
  [bt] (0) /home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0x829442) [0x7f8532b52442]
  [bt] (8) /home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0xbdb890) [0x7f8532f04890]
  [bt] (7) /home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::InferType(tvm::relay::Function const&, tvm::relay::Module const&, tvm::relay::GlobalVar const&)+0x325) [0x7f85330dda35]
  [bt] (6) /home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0xdb3d03) [0x7f85330dcd03]
  [bt] (5) /home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0xdcc4c4) [0x7f85330f54c4]
  [bt] (4) /home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0xc1fc24) [0x7f8532f48c24]
  [bt] (3) /home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0xcdb00a) [0x7f853300400a]
  [bt] (2) /home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/libtvm.so(tvm::BijectiveLayout::ForwardShape(tvm::Array<HalideIR::Expr, void> const&) const+0x6e) [0x7f8532cb333e]
  [bt] (1) /home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0x98cc66) [0x7f8532cb5c66]
  [bt] (0) /home/yf/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0x829442) [0x7f8532b52442]
  File "/home/yf/.software/tvm/src/relay/ir/error.cc", line 112
TVMError:
Error(s) have occurred. We have annotated the program with them:

In `main`: 
v0.0.1
%1 = fn () {
  %0 = layout_transform(meta[relay.Constant][0] // , src_layout="NCHW", dst_layout="NCHW5c") // an internal invariant was violdated while typechecking your program [21:50:26] /home/yf/.software/tvm/src/lang/data_layout.cc:254: Check failed: src_shape.size() == src_axis.size() (1 vs. 4) : 
; 
  %0
}
%1
// meta data omitted. you can use show_meta_data=True to include meta data

Code to reproduce

import os
import tvm
import mxnet as mx
import nnvm.compiler
import tvm.relay as relay
from tvm.contrib import graph_runtime

from mxnet import gluon
from mxnet.gluon import nn as gnn

class Pnet(gluon.HybridBlock):
    """Proposal Network"""
    def __init__(self, size=12, **kwargs):
        super(Pnet, self).__init__(**kwargs)
        self.size = size
        self.base = gnn.HybridSequential()
        self.base.add(
            gnn.Conv2D(10, 3), gnn.PReLU(10),
            gnn.MaxPool2D(ceil_mode=True), # caffe default
            gnn.Conv2D(16, 3), gnn.PReLU(16),
            gnn.Conv2D(32, 3), gnn.PReLU(32))
        self.conv4_1 = gnn.Conv2D(2, 1)
        self.conv4_2 = gnn.Conv2D(4, 1)

    def hybrid_forward(self, F, x):
        x = self.base(x)
        cls_pred = self.conv4_1(x)
        bbx_pred = self.conv4_2(x)
        if not mx.autograd.is_training():
            cls_pred = F.softmax(cls_pred, axis=1)
        return cls_pred, bbx_pred

def test(network='pnet'):
    shape_dict = {'data': (1, 3, 224, 224)}
    x = mx.nd.random.randn(*shape_dict['data']).asnumpy()
    net = Pnet()
    net.initialize()
    net.hybridize()
    net(mx.nd.array(x))
    net.export(network, 0)
    sym, arg_params, aux_params = mx.model.load_checkpoint(network, 0)
    os.remove(network + '-symbol.json')
    os.remove(network + '-0000.params')
    func, params = relay.frontend.from_mxnet(sym, shape_dict, 'float32', arg_params, aux_params)
    with relay.build_config(opt_level=3):
        graph, lib, params = relay.build(func, 'llvm -mcpu=core-avx2', params=params)
    mod = graph_runtime.create(graph, lib, tvm.cpu())
    mod.load_params(bytearray(relay.save_param_dict(params)))
    mod.run(data=x)

test()

FrozenGene · April 7, 2019, 5:37pm

I met the same error. @kevinthesun @eqy

FrozenGene · April 12, 2019, 2:15am

ping ping @kevinthesun @eqy, have you noticed this notification? I think this should be reproduced easily, which will make us couldn’t use NHCW[x]c layout.

eqy · April 12, 2019, 2:25am

I am not really familiar with this particular use case, but can we isolate this by just turning off alter_op_layout?

FrozenGene · April 12, 2019, 2:29am

Yes. This could be solved turning off alter_op_layout. And interestingly, NNVM could pass but Relay couldn’t.

kevinthesun · April 12, 2019, 4:48am

This is caused by the incorrect FInferCorrectLayout attr of prelu in relay(It uses ElemwiseArbitraryLayout which takes it as elemwise), and in nnvm it is implemented correctly. We might need to implement similar FInferCorrectLayout for relay prelu.

FrozenGene · April 12, 2019, 7:34am

PR: https://github.com/dmlc/tvm/pull/3013
@kevinthesun @eqy