I’m trying to isolate TVM inference discrepancies between Android (arm64 (not working)) and Ubuntu (x86_64 (working)) platforms using a modified version of the from_mxnet.py resnet 18 + cat example with a Vulkan back end. I’d like to dump output for each layer to see where things diverge. After calling run()
with valid input, What is the best way to dump output for all layers from the C++ SDK to support this analysis (not just the final layer)?
As discussed here, we can create a debug runtime with tvm.graph_runtime_debug.create
and we can use the get_output_by_layer
packaged function (or debug_get_output
) call for this.
[EDIT: gist translated to a complete github repo here]
Using the C++ debug_get_output
packaged function calls require the output DLTensor
argument to be pre-allocated with the correct parameters, like this:
TVMArrayAlloc(shape.data(), shape.size(), dtype_code, dtype_bits, dtype_lanes, device_type, device_id, &layer_output2);
These properties are current parsed while loading the JSON file, but they are private member variables of TVM’s C++ GraphRuntime
class, so I ended copy-and-pasting the JSON parsing code from GraphRuntime
into the C++ example in order to expose those types to support the required allocations. I didn’t see any other way to do this in TVM, but I might be missing something.
After adding that, I noticed the get_output_by_layer
call, which does’t require pre-allocation of the DLTensor
output, that seems to return internal DLTensor
pointers by value, so the GraphRuntime
mod is not required in that case. I simply call run()
and then iterate over each layer to perform the logging. In that case we still need to know the # of layers to iterate over, which doesn’t seem to be exposed in the existing API.