Yolov3-tiny batch input test failed

kitkat · May 25, 2020, 6:18am

Hi,

I’m trying to inference “yolov3-tiny” model with input batch_size = 4.

The input shape was (4, 3, 416, 416).

However, the shape of the output is as follows:

module.get_output(0) --> (1, 255, 26, 26)

module.get_output(1) --> (1, 255, 13, 13)

IMHO, the problem has occurred when the following code is executed:

input_shape = (4, 3, 416, 416)
mod, params = relay.frontend.from_darknet(net, dtype=dtype, shape=input_shape)

when I print out mod[“main”], it seems that the reshape func does not support batch input.

%49 = nn.leaky_relu(%48, alpha=0.1f) /* ty=Tensor[(4, 256, 26, 26), float32] */;
%50 = nn.conv2d(%49, %LAYERTYPE.CONVOLUTIONAL22_weight, padding=[0, 0, 0, 0], channels=255, kernel_size=[1, 1]) /* ty=Tensor[(4, 255, 26, 26), float32] */;
%51 = nn.bias_add(%50, %LAYERTYPE.CONVOLUTIONAL22_bias) /* ty=Tensor[(4, 255, 26, 26), float32] */;
%52 = reshape(%51, newshape=[1, 3, 85, 26, 26]) /* ty=Tensor[(1, 3, 85, 26, 26), float32] */;
%53 = split(%52, indices_or_sections=[2, 4], axis=2) /* ty=(Tensor[(1, 3, 2, 26, 26), float32], Tensor[(1, 3, 2, 26, 26), float32], Tensor[(1, 3, 81, 26, 26), float32]) */;
%54 = %53.0;
%55 = sigmoid(%54) /* ty=Tensor[(1, 3, 2, 26, 26), float32] */;
%56 = %53.1;
%57 = %53.2;
%58 = sigmoid(%57) /* ty=Tensor[(1, 3, 81, 26, 26), float32] */;
%59 = (%55, %56, %58);
%60 = concatenate(%59, axis=2) /* ty=Tensor[(1, 3, 85, 26, 26), float32] */;
%61 = reshape(%60, newshape=[1, 255, 26, 26]) /* ty=Tensor[(1, 255, 26, 26), float32] */;

As shown in the above results,

the output shape of the reshape function in %52 is (1, 3, 85, 26, 26) rather than (4, 3, 85, 26, 26)

Is there any idea how to resolve such a problem??

best wishes,

R. Kim

siju-samuel · May 26, 2020, 6:53am

diff --git a/python/tvm/relay/frontend/darknet.py b/python/tvm/relay/frontend/darknet.py
index 936d7c0dc..62a320780 100644
--- a/python/tvm/relay/frontend/darknet.py
+++ b/python/tvm/relay/frontend/darknet.py
@@ -637,12 +637,12 @@ class GraphProto(object):
             attr.update({'coords' : layer.coords})
             attr.update({'background' : layer.background})
             attr.update({'softmax' : layer.softmax})
-            attr.update({'shape' : (1, layer.c, layer.h, layer.w)})
+            attr.update({'shape' : (-1, layer.c, layer.h, layer.w)})
 
         elif LAYERTYPE.YOLO == layer_type:
             attr.update({'n' : layer.n})
             attr.update({'classes' : layer.classes})
-            attr.update({'shape' : (1, layer.c, layer.h, layer.w)})
+            attr.update({'shape' : (-1, layer.c, layer.h, layer.w)})
 
         elif LAYERTYPE.UPSAMPLE == layer_type:
             attr.update({'scale' : layer.stride})

You can apply this diff to python/tvm/relay/frontend/darknet.py to quickly solve this issue. This will change your output to module.get_output(0) --> (4, 255, 26, 26) module.get_output(1) --> (4, 255, 13, 13) But you need to do the post processing to consider the batch values as well. layer_out['output'] = layer_out['output'].reshape(out_shape) Here the outshape must consider the batch size as well, currently its written to handle only batch_size of 1.

kitkat · May 26, 2020, 8:16am

Siju,

The problem is solved!!

When I run Yolov3-tiny on jetson nano, it takes about 35 ms for single image inference.

Now, it takes about 120 ms for four image inference.

Greatly appreciate for your response.

HilmiK · May 29, 2020, 9:22am

@kitkat

What is target backend when you get this timing performance ?

Thanks

kitkat · May 29, 2020, 11:17am

The target device is jetson nano.

AutoTVM is used to derive best logs for conv2d layers on cuda backend with sm_52 compute capability option.