Wrong output in keras model

Hello everyone,
I’m using a multi output keras model of yolov3. I’m able to get multi outputs from the model but they are wrong outputs. I tried changing targets to opencl and llvm alternatively with different opt_levels = [0-4] in tvm compiler with input data of type float32 but it seems none of them are correct to original model prediction or atleast not even close.

Any suggestions are welcome.
Thanks!

Can you provide which conversion script you used build keras yolov3 model.

I got the script form: https://github.com/allanzelener/YAD2K.
How to cross check weights in nnvm params?

It seems that the yolov3 model doesn’t include a flatten layer to convert a shape layout of the output.

Can you change your code here from

out_shape = [dim.value if dim.value is not None else 1 for dim in keras_yolo._output_layers[0].output.shape]
tvm_out = m.get_output(0, tvm.nd.empty(out_shape, ‘float32’)).asnumpy()

to

(n, h, w, c) = [dim.value if dim.value is not None else 1 for dim in keras_yolo._output_layers[0].output.shape]
tvm_out = m.get_output(0, tvm.nd.empty((n, c, h, w), ‘float32’)).asnumpy()
tvm_out = tvm_out.transpose([0,2,3,1])

and try it again?

Hey @kazum,
Thanks for your response. I’ve actually tried this possibility earlier. I understood, it is not about reshaping the output_values but about prediction values from the compiled model. I cross checked the every output value with tvm_model to keras_model (just in case of any miss alignments of tvm_output) but none of them were matched or atleast not close. So I assume that there is an issue with the keras to tvm compilation for this conversion.

@kazum @PariksheetPinjari909, Any updates :slight_smile: ??

I tried the following code and no assertions were raised. It looks like the outputs of Keras and TVM are at least close. If you get errors with my example, can you share your yolo.h5 file?

import keras
import numpy as np
import nnvm
import tvm

keras_model = keras.models.load_model('yolo.h5')

in_shapes = []
for layer in keras_model._input_layers:
    in_shapes.append([1 if dim is None else dim for dim in layer.input_shape])
out_shapes = []
for layer in keras_model._output_layers:
    out_shapes.append([1 if dim is None else dim for dim in layer.output_shape])

def get_tvm_output(xs):
    def to_channels_last(shape):
        return [shape[0]] + list(shape[2:]) + [shape[1]]

    def to_channels_first(shape):
        return [shape[0], shape[-1]] + list(shape[1:-1])

    dtype='float32'
    xs = [x.transpose(to_channels_first(range(x.ndim))) for x in xs]
    sym, params = nnvm.frontend.from_keras(keras_model)
    shape_dict = {name: x.shape for (name, x) in zip(keras_model.input_names, xs)}
    graph, lib, params = nnvm.compiler.build(sym, "llvm", shape_dict, params=params)
    m = tvm.contrib.graph_runtime.create(graph, lib, tvm.cpu())
    for name, x in zip(keras_model.input_names, xs):
        m.set_input(name, tvm.nd.array(x.astype(dtype)))
    m.set_input(**params)
    m.run()

    tvm_out = []
    for i, shape in enumerate(out_shapes):
        out = m.get_output(i, tvm.nd.empty(to_channels_first(shape), dtype)).asnumpy()
        out = out.transpose(to_channels_last(range(out.ndim)))
        tvm_out.append(out)
    return tvm_out

xs = [np.random.uniform(size=shape) for shape in in_shapes]

keras_out = keras_model.predict(xs)
tvm_out = get_tvm_output(xs)

for a, b in zip(keras_out, tvm_out):
    np.testing.assert_allclose(a, b, rtol=1e-4, atol=1e-4)

@kazum thanks for this example. I made a mistake that I’m directly copying the output without considering the ‘channel_first’ analogy. I tried your implementation like i first copied to an empty array of ‘channel_first’ then I transposed to ‘channel_last’, Yes!!, they are closely relative and added some error term to replenish. It is finally worked. It was nowhere mentioned that outputs are also channel_first i.e[batch_size, C, H, W], hence this confusion, Please update docs/tutorials :). However, GPU to CPU copying speed (sync or async) bottleneck lagging made this implementation not suitable for my application. I would like to share some results of my implementation, those are helpful to someone:

  1. target=gpu, time_to_process = ~0.01s, time_to_copy = ~4.8s (even with tvm gpu sync)
  2. target=cpu, time_to_process = ~2.8s, time_to_copy = ~0.01s

Thanks, again!!