Layer serialization from C++

headupinclouds · December 29, 2018, 6:08am

I’m trying to isolate TVM inference discrepancies between Android (arm64 (not working)) and Ubuntu (x86_64 (working)) platforms using a modified version of the from_mxnet.py resnet 18 + cat example with a Vulkan back end. I’d like to dump output for each layer to see where things diverge. After calling run() with valid input, What is the best way to dump output for all layers from the C++ SDK to support this analysis (not just the final layer)?

headupinclouds · January 6, 2019, 11:07pm

As discussed here, we can create a debug runtime with tvm.graph_runtime_debug.create and we can use the get_output_by_layer packaged function (or debug_get_output) call for this.

[EDIT: gist translated to a complete github repo here]

gist.github.com

https://gist.github.com/headupinclouds/d87df2dcc9603589b3b9b9cb00f26011#file-tvm_deploy_gpu_sample-cpp-L521

CMakeLists.txt

cmake_minimum_required(VERSION 3.2)
project(tvm-deploy-gpu-sample)

include_directories(/dl/mxnet/3rdparty/tvm/3rdparty/dlpack/include)
include_directories(/dl/mxnet/3rdparty/tvm/3rdparty/dmlc-core/include)
include_directories(/dl/mxnet/3rdparty/tvm/include)

function(print_cmake_vars)
  get_cmake_property(_variableNames VARIABLES)
  list (SORT _variableNames)

This file has been truncated. show original

tvm_deploy_gpu_sample.cpp

#define TUI_SAVE_LAYERS 0
#define TUI_TEST_DEBUG_GET_OUTPUT 1

#include <string>
#include <cstring>
#include <fstream>
#include <algorithm>
#include <chrono>
#include <iomanip>

This file has been truncated. show original

tvm_runtime_pack.cc

/*!
 * \brief This is an all in one TVM runtime file.
 *
 *   You only have to use this file to compile libtvm_runtime to
 *   include in your project.
 *
 *  - Copy this file into your project which depends on tvm runtime.
 *  - Compile with -std=c++11
 *  - Add the following include path
 *     - /path/to/tvm/include/

This file has been truncated. show original

Using the C++ debug_get_output packaged function calls require the output DLTensor argument to be pre-allocated with the correct parameters, like this:

TVMArrayAlloc(shape.data(), shape.size(), dtype_code, dtype_bits, dtype_lanes, device_type, device_id, &layer_output2);

These properties are current parsed while loading the JSON file, but they are private member variables of TVM’s C++ GraphRuntime class, so I ended copy-and-pasting the JSON parsing code from GraphRuntime into the C++ example in order to expose those types to support the required allocations. I didn’t see any other way to do this in TVM, but I might be missing something.

After adding that, I noticed the get_output_by_layer call, which does’t require pre-allocation of the DLTensor output, that seems to return internal DLTensor pointers by value, so the GraphRuntime mod is not required in that case. I simply call run() and then iterate over each layer to perform the logging. In that case we still need to know the # of layers to iterate over, which doesn’t seem to be exposed in the existing API.