Layer serialization from C++


#1

I’m trying to isolate TVM inference discrepancies between Android (arm64 (not working)) and Ubuntu (x86_64 (working)) platforms using a modified version of the from_mxnet.py resnet 18 + cat example with a Vulkan back end. I’d like to dump output for each layer to see where things diverge. After calling run() with valid input, What is the best way to dump output for all layers from the C++ SDK to support this analysis (not just the final layer)?


#2

As discussed here, we can create a debug runtime with tvm.graph_runtime_debug.create and we can use the get_output_by_layer packaged function (or debug_get_output) call for this.

[EDIT: gist translated to a complete github repo here]

Using the C++ debug_get_output packaged function calls require the output DLTensor argument to be pre-allocated with the correct parameters, like this:

TVMArrayAlloc(shape.data(), shape.size(), dtype_code, dtype_bits, dtype_lanes, device_type, device_id, &layer_output2);

These properties are current parsed while loading the JSON file, but they are private member variables of TVM’s C++ GraphRuntime class, so I ended copy-and-pasting the JSON parsing code from GraphRuntime into the C++ example in order to expose those types to support the required allocations. I didn’t see any other way to do this in TVM, but I might be missing something.

After adding that, I noticed the get_output_by_layer call, which does’t require pre-allocation of the DLTensor output, that seems to return internal DLTensor pointers by value, so the GraphRuntime mod is not required in that case. I simply call run() and then iterate over each layer to perform the logging. In that case we still need to know the # of layers to iterate over, which doesn’t seem to be exposed in the existing API.