Tvm + mxnet compatibility


#1

TVM and MXNET share common dependencies bundled in their respective 3rdparty directories, and it is common to import both tvm and mxnet in the same projects, such as in this tutorial from_mxnet.py

Previously I encountered incompatibility issues after trying to pull and build the latest from tvm and mxnet and then use them in the same script. It was some time ago, but I believe it was related to differences in dlpack.h.

  • mxnet/3rdparty/dlpack/include/dlpack/dlpack.h
  • mxnet/3rdparty/tvm/3rdparty/dlpack/include/dlpack/dlpack.h

I have been using the commit number of mxnet/3rdparty/tvm as a tested and compatible version indicator, however, I recently encountered a TVM issue that has been fixed in a newer version than the submodule pointer maintained in MXNET, and I was unclear on what the best update policy was in this scenario.

Are the compatible versions of MXNET and TVM documented, or is there a heuristic that might server to indicate whether or not two versions of MXNET and TVM are expected to work together? (The previous issue didn’t show up until runtime tests with the TVM tutorials.) Do people typically use the latest TVM and MXNET versions together without issue?


#2

what incompatibility issues did you meet?

I think dlpack.h are the same between tvm and mxnet.
There is an ABI incompatibility issue in MXTVMBrideg, when mxnet and tvm are built eith different compilers.


#3

what incompatibility issues did you meet?

I was reminded that the issue occurred while attempting to build MXNET and TVM from a common set of shared/installed dependencies in CMake:

find_package(dlpack CONFIG REQUIRED) 
target_link_libraries(tvm dlpack::dlpack)

… etc.

The goal was to support use of TVM and MXNET in the same project while avoiding ODR conflicts by design.

That approach encountered problems because TVM and MXNET bundle different versions of the same dependencies.

git clone --recursive https://github.com/apache/incubator-mxnet.git
comm -12 <(ls incubator-mxnet/3rdparty) <(ls incubator-mxnet/3rdparty/tvm/3rdparty)
dlpack
dmlc-core

dlpack

diff incubator-mxnet/3rdparty/dlpack/include/dlpack/dlpack.h incubator-mxnet/3rdparty/tvm/3rdparty/dlpack/include/dlpack/dlpack.h  | head -10
16c16
< #define DLPACK_VERSION 010
---
> #define DLPACK_VERSION 020

dmlc-core

diff incubator-mxnet/3rdparty/dmlc-core/include/dmlc/base.h incubator-mxnet/3rdparty/tvm/3rdparty/dmlc-core/include/dmlc/base.h  | head -4
175,179d174
< #if DMLC_USE_FOPEN64 && \
<   (!defined(__GNUC__) || (defined __ANDROID__) || ((defined __MINGW32__) && !(defined __MINGW64__)))
< #define fopen64 std::fopen

NNVM

They also share common NNVM sources, as MXNET includes a subset of source code from the TVM submodule directly in the build:

I should be clear that after building with the default documented build instructions, the runtime tests worked fine. That wasn’t clear in the original post, but I had been away from the effort for some time and had forgotten the underlying issue. Since TVM and MXNET are quite tightly coupled, it isn’t clear that MXNET will always be compatible with the latest TVM. There seem to be two closely related questions:

  1. Which TVM NNVM (bundled) sources are compatible with a given MXNET build? (Are TVM NNVM updates always tested against the latest MXNET builds?)

  2. Should MXNET and TVM add explicit version checks for compatibility and to help resolve dependency conflicts? (How would compatibility be ensured in a package manager or Linux distribution, where you would ideally want a single version of each lib in /usr/local(or wherever).)


#4

Current MXNet actually relies heavily on NNVM, compatibility of other building blocks is not guaranteed. We are actually thinking about keeping MXNet and TVM on the same page.


#5

We are actually thinking about keeping MXNet and TVM on the same page.

Merging TVM and MXNET seems like a good solution.

For reference, it looks like the same thing will happen with mshadow.


#6

It is proposed that mshadow be merged to mxnet. This should be a much easier transition because mxnet uses mshadow as its core tensor operator library.