[Potential Bug] TVM crashes at the second time when continuously compiling two Keras models with the same layer name

I encounter a problem (crash) when continuously compiling two Keras models (their names of the first layer are the same) using relay.frontend.from_keras. More specifically, I find when continuously loading the two models, Keras can rename the same layer name for the second loaded model, but relay.frontend.from_keras cannot identify the renamed layer when compiling the second model and reports the crash message “TVMError: Check failed: checked_type.as() == nullptr: Cannot resolve type of Var(conv2d_1_input) at (nullptr)”.

Here is a reproducible script (for simplification, I directly compile one model twice in this script):

import keras
import tvm
import numpy as np
from tvm import relay

def loadCifar10Dataset():
    _, (x_test, y_test) = keras.datasets.cifar10.load_data()
    x_test = x_test.astype('float32') / 255.0
    w, h = 32, 32
    x_test = x_test.reshape(x_test.shape[0], w, h, 3)

    test_x, test_y = x_test[:1500],y_test[:1500]
    return test_x, test_y


def my_func(model_path):
    predict_model = keras.models.load_model(model_path)

    x, y =  loadCifar10Dataset()
    x_tvm = x.transpose([0, 3, 1, 2])

    input_name = predict_model.input.name.split(':')[0]
    print('\033[1;35m ', predict_model.input, '\033[0m')
    shape_dict = {input_name: x_tvm.shape}
    mod, params = relay.frontend.from_keras(predict_model, shape_dict)
    

if __name__ == '__main__':
    model_path = "your_path/vgg16-cifar10.h5" # subtitute "your_path" the absolute path where vgg16-cifar10.h5 stays in
    my_func(model_path)
    my_func(model_path)

The involving model vgg16-cifar10.h5 can be installed through google drive: https://drive.google.com/file/d/1CQOtLADOOjNwm34OfHc7IiPKBikS3NT7/view?usp=sharing

We call the funcion my_func for 2 times (two times of compilations), and at the second time, TVM throws an exception:

Traceback (most recent call last):

  File "a.py", line 32, in <module>
    my_func(model_path)

  File "a.py", line 25, in my_func
    mod, params = relay.frontend.from_keras(predict_model, shape_dict)

  File "/tensorflow/install_env/tvm-0.7/tvm/python/tvm/relay/frontend/keras.py", line 919, in from_keras
    return IRModule.from_expr(func), params

  File "/tensorflow/install_env/tvm-0.7/tvm/python/tvm/ir/module.py", line 223, in from_expr
    return _ffi_api.Module_FromExpr(expr, funcs, defs)

  File "tvm/_ffi/_cython/./packed_func.pxi", line 308, in tvm._ffi._cy3.core.PackedFuncBase.__call__

  File "tvm/_ffi/_cython/./packed_func.pxi", line 243, in tvm._ffi._cy3.core.FuncCall

  File "tvm/_ffi/_cython/./packed_func.pxi", line 232, in tvm._ffi._cy3.core.FuncCall3

  File "tvm/_ffi/_cython/./base.pxi", line 159, in tvm._ffi._cy3.core.CALL

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) /tensorflow/install_env/tvm-0.7/tvm/build/libtvm.so(tvm::RelayExpr tvm::relay::TypeInferencer::Resolver::AttachCheckedType<tvm::relay::FunctionNode>(tvm::relay::FunctionNode const*)+0x1da) [0x7f8d7aa906da]
  [bt] (7) /tensorflow/install_env/tvm-0.7/tvm/build/libtvm.so(tvm::relay::ExprMutator::VisitExpr_(tvm::relay::FunctionNode const*)+0x729) [0x7f8d7ab39cf9]
  [bt] (6) /tensorflow/install_env/tvm-0.7/tvm/build/libtvm.so(tvm::relay::ExprMutator::VisitExpr(tvm::RelayExpr const&)+0x76) [0x7f8d7ab3a766]
  [bt] (5) /tensorflow/install_env/tvm-0.7/tvm/build/libtvm.so(tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)+0x7c) [0x7f8d7a9b2f9c]
  [bt] (4) /tensorflow/install_env/tvm-0.7/tvm/build/libtvm.so(tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>::InitVTable()::{lambda(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>*)#3}::_FUN(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>*)+0x13) [0x7f8d7a9b0fa3]
  [bt] (3) /tensorflow/install_env/tvm-0.7/tvm/build/libtvm.so(tvm::relay::TypeInferencer::Resolver::VisitExpr_(tvm::relay::VarNode const*)+0x67) [0x7f8d7aa95bc7]
  [bt] (2) /tensorflow/install_env/tvm-0.7/tvm/build/libtvm.so(tvm::relay::TypeInferencer::Resolver::VisitVar(tvm::relay::Var const&)+0xca) [0x7f8d7aa9596a]
  [bt] (1) /tensorflow/install_env/tvm-0.7/tvm/build/libtvm.so(tvm::RelayExpr tvm::relay::TypeInferencer::Resolver::AttachCheckedType<tvm::relay::VarNode>(tvm::relay::VarNode const*)+0x1ac) [0x7f8d7aa9320c]
  [bt] (0) /tensorflow/install_env/tvm-0.7/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x22) [0x7f8d7a49a2e2]
  File "/tensorflow/install_env/tvm-0.7/tvm/src/relay/pass/type_infer.cc", line 689
TVMError: Check failed: checked_type.as<IncompleteTypeNode>() == nullptr: Cannot resolve type of Var(conv2d_1_input) at (nullptr)

And the output of print statement in our script is shown as below:

  Tensor("conv2d_1_input:0", shape=(?, 32, 32, 3), dtype=float32) 
  Tensor("conv2d_1_input_1:0", shape=(?, 32, 32, 3), dtype=float32)

It can be deduced from this output that Keras automatically rename the first layer for the second time of loading the model (from “conv2d_1_input” to “conv2d_1_input_1”). I can successfully run the first call to my_func, but I failed at the second call. It seems that TVM is unable to deal with the layer renamed by Keras. By the way, we perform this experiment using tvm 0.7.dev1 on a workstation with 8-core CPU, 295G memory and CentOS Linux release 7.7.1908 system.

1 Like