How to get multi output from module.get_output()?


Currently, I find that all examples / tutorials are only one output. However, we have some models maybe have multi output layers, for example, ssd mobilenets.

For this situation, how we get multi output?

Additionally, I find that sometimes our output shape is (1000,), sometimes is (1, 1000),both are ok?

    # get nth output with out_shape as output shape
    out = module.get_output(n, out=tvm.nd.empty(out_shape, ctx=ctx))


Thanks. It works. But how about output_shape is different? for example, ssd_mobilenet, our two output_shape are different. We should call two times of module.get_output(0, out=tvm.nd_empty(out_shape_0, ctx=ctx)) and module.get_output(1, out=tvm.nd.empty(out_shape_1), ctx=ctx) or out_shape = (out_shape_0, out_shape_1), then module.get_output(1, out=tvm.nd.empty(out_shape), ctx=ctx)???


tvm.nd.empty(out_shape, ctx=ctx) allocates an array, so you can allocate several arrays for each output. It is recommended not to allocate arrays repetitively, but allocate it once and pass it to out parameter


So, you means I can:
module.get_output(0, out=tvm.nd_empty(out_shape_0, ctx=ctx))
module.get_output(1, out=tvm.nd.empty(out_shape_1), ctx=ctx)


I think @tqchen means

# allocate once
out_0 = tvm.nd.empty(out_shape_0, ctx=ctx)
out_1 = tvm.nd.empty(out_shape_1, ctx=ctx)

# infer for many times
for i in range(...):
    # set input and
    # pass as arguments
    module.get_output(0, out_0)
    module.get_output(1, out_1)


similar question,

I’m writing a c++ test application to test a inference using mxnet + resnet.
This network requires two outputs(1, 1000) but below example code is considered only for one output.
For the build, I modified out_ndim = 2; int64_t out_shape[2] = {1, 1000, };
the building works but seems it shows wrong result below,
The maximum position in output vector is: 0

Could you give me an advice how I could correct it?

Inki Dae

61 DLTensor* y;
62 int out_ndim = 1;
63 int64_t out_shape[1] = {1000, };
64 TVMArrayAlloc(out_shape, out_ndim, dtype_code, dtype_bits, dtype_lanes, device_type, device_id, &y);
66 // get the function from the module(get output data)
67 tvm::runtime::PackedFunc get_output = mod.GetFunction(“get_output”);
68 get_output(0, y);
70 // get the maximum position in output vector
71 auto y_iter = static_cast<float*>(y->data);
72 auto max_iter = std::max_element(y_iter, y_iter + 1000);
73 auto max_index = std::distance(y_iter, max_iter);
74 std::cout << "The maximum position in output vector is: " << max_index << std::endl;
76 TVMArrayFree(x);
77 TVMArrayFree(y);


Excuse me . did you solve this problem ? ?



I am testing a model with 3 outputs, however, when I use m.get_output(0, ) for example for index 0, I always get different outputs. This means that the outputs are randomly mapped to the output indexes.

Is this a bug? or is there any way to get the output indexes in a deterministic way?



I would like to share the solution to my problem described in the previous comment.

The issue is that I defined the set of outputs of the model as a “set” as follows:

outputs = {'output1', 'output2', 'output3'}
mod, params = relay.frontend.from_tensorflow(graph_def, layout=layout, outputs=outputs, shape=shape_dict)

This results in a random order of the outputs when the set is indexed. I used this code as I found it as a example to define the outputs, however, the right way is with a list as follows:

outputs = ['output1', 'output2', 'output3']
mod, params = relay.frontend.from_tensorflow(graph_def, layout=layout, outputs=outputs, shape=shape_dict)

Small difference but it is important to be aware of this


I guess copying output (might be large) from GPU to CPU asynchronously could be more time efficient.


I’ve done so successfully.