Clojure bindings for TVM


#1

We have been working on and off for a while on clojure bindings for TVM and have decided to bring them to a wider audience.

We tried both javacpp and JNA and found JNA to be the way to go in the end. This means we bind dynamically to the tvm installed on the system which makes more sense than attempting to build labelled jars with the huge array of options possible.

We are trying to bring TVM to a wider audience and promote it for bespoke numeric calculations potentially outside of its deep learning roots. We would also like to see more java integrations to DMLC projects that allow easier access to multiple languages; ones that force unnecessary dependencies on the scala compiler also force dependencies on the root jvm runtime (mxnet, we are looking at you).

In any case, I took a crack at explaining the theory behind the compiler here:
http://techascent.com/blog/high-performance-compilers.html

Here is the python schedule tutorial expanded a bit:
http://techascent.com/blog/tvm-for-the-win.html

And here is the project itself:
https://github.com/tech-ascent/tvm-clj

Note that we enable a direct operation on native pointer backed things (like opencv images):

We want to thank everyone who has helped us build this system and are excited to watch it develop its potential further!


#2

Thanks for bringing the discussion here. Did you see any chance to reuse JNI implementation in current java binding (tvm4j) ?


#3

I see a way, over time, to pull functionality out of the low level binding layer of tvm-clj and into the base java libraries of tvm. I am not in favor of the stack abstraction used in the current java bindings (unnecessary and not threadsafe) and there is no need for a bespoke JNI layer as tvm already has a fantastic c bindings layer and JNA works well with it.

I think working over time to move base functionality out of tvm-clj and into the java bindings is a good way forward. The first step of this would be to eliminate the bespoke JNI layer and use instead jna: https://github.com/java-native-access/jna. Wrap every C function exported in runtime, dsl, and any others and make it late binding (so you could, for instance, choose to load the runtime only and the only thing that happens is if you call a compiler function it throws an exception) and then stop and export only this to maven central.

I can then remove my custom bindings at the function level and we can start to share more code.

You can see the custom bindings in places like this:

Note they are essentially one liners that marshal arguments into whatever form the function takes. In java I guess you would just type out the arguments.

Note I have written jna struct bindings for the dltensor types which can form the basis of both these bindings and some of the mxnet bindings:

In this way we can begin to get the benefits of the larger architecture in terms of sharing code and functionality across the DMLC ecosystem on the JVM. I mean really, you could have one java binding library that contained just the JNA bindings for xgboost, tvm, nnvm, and mxnet. All of which are late binding and of course based on different base shared libraries. Then we can start to move forward with a bit more velocity in terms of unifying all these things at the jvm level.

What do you think of this plan?


#4

I like the idea of using JNA. The only concern is current tvm4j_full.jar is around 270KB, while jna.jar itself is 1.4MB. In this case I personally don’t think it is a big deal, since APK size limit on Google Play is 100MB. But we do need to take the file size into consideration since we’re also targeting edge devices. cc @tqchen if you have comments.

btw could you clarify what is the unnecessary & not-threadsafe issue?


#5

I do think that jni is great for the reason @yzhliu mentioned(minimalism) for cross platform support.


#6

This is the stack abstraction I was talking about:

  native void tvmFuncPushArgLong(long arg);

  native void tvmFuncPushArgDouble(double arg);

  native void tvmFuncPushArgString(String arg);

  native void tvmFuncPushArgBytes(byte[] arg);

  native void tvmFuncPushArgHandle(long arg, int argType);

That stack abstraction. Looking carefully at the jni bindings, I can see that it is threadsafe; sort of. It has no unwind mechanism and I really don’t see why the function interface isn’t just a generic object array that you can then do rtti on and convert to the actual tvm value objects. It is storing hidden state and so depending on how the upper layers are coded this state always has a chance to interact in an undefined fashion with code running later. As I said, I feel it is unnecessary.

JNA is more minimal in the sense that you do not have to write a jni.c file nor do you have to compile and bundle an extra shared library. With JNA, I can publish a jar to maven that contains no further binary artifacts that will work with either tvm or tvm-runtime with no further changes and will continue to work as the shared library changes. JNA is less work, overall, that writing bespoke JNI.

If I were to go the bespoke JNI route, I would use javacpp which does your JNI bindings for you automatically as I did in the first version. I believe that javacpp was designed to work on both the main JVM and Dalvik. Javacpp is also less work, overall, than bespoke JNI.

So, I don’t understand the claim to minimalism. I do understand the need for small APK sizes and this had absolutely not occured to me. While at NVIDIA, I developed custom JNI bindings for a graphics subsystem to use in their automotive platforms so I chose the same route that TVM has currently chosen but I was not as educated about the java ecosystem as I am now and I didn’t have a well designed C interface to our system already in place.

It may be worth it to keep the current bindings but there isn’t a good reason for me to use them as they don’t give raw access to the C api, they don’t link to the compiler or dsl api’s, their array access is quite incomplete in the sense that it doesn’t expose the fact that arrays are DL-tensors (you can’t get/set the stride, or the root data), etc etc etc. They appear to be specialized to precisely their use case in terms of running tvm code in a specific context. The clojure bindings are exactly equivalent to the python bindings so if you can imagine trying to put the python bindings on top of the current jvm bindings in tvm you see how this is just absolutely not even a remote possibility. So in order for us to share code the root jvm bindings need to be:

  1. greatly expanded in scope.
  2. The shared library has to have a more dynamic lookup mechanism. One thing I want to do is use tvm with mxnet in which case I will be dynamically binding to the mxnet shared library instead of tvm.
  3. Raw access to the C api. No wrappers; I don’t want to figure out layers of abstractions on top of the base abstraction.
  4. On maven central.

As an aside, for edge devices I have been considering node bindings. I looked into node’s ffi system and it is a mess but nodeffi would get you both android and ios devices, potentially. And potentially access to at least 4 million developers.


#7

Let me elaborate my reasoning a bit more. There have been some past proposals about JNI vs JNA vs javacpp in the past project that I was working on, and this discussion is more of a summary of my past experience.

Before we dive into details, let us be reminded that there is no free lunch in the FFI world. In all the case we need to translate the jvm data structure into an off-heap data that can be used by the C API. It can either happen at the JNI wrapping or in the case of JNA, happens in JNA library.

JNA

JNA provides a nice abstraction of removing the burden of writing JNI. It does come with a libffi dependency. I can imagine there is still some JNI boilerplate that is happening in JNA itself. Because it has to work for every C API, the resulting dependency is larger than just supporting the limited set of API if we directly go with the JNI setting.

JNI

JNI can be viewed as “assembly” of the ffi, it gives you the maximum control, but also demand you to write code carefully. It is definitely not fun to write JNI, but we can get it right if we are careful, including the thread safety both in terms of C API and in terms of JVM attach/detachment.

Why the current tvm4j chooses JNI

The main reason is the minimalist dependency @yzhliu mentioned. If the current project has an ever-growing set of C APIs, I think it may not makes sense to choose the JNI route and we should pick JNA instead for maintenance reasons. However, past lessons teach us to design a fixed C API and use PackedFunc and Node system to grow the APIs that get exposed. This means we only have to get it right once and do not need to expand more functionality support. In some sense, the TVM java runtime system can be viewed as a minimum special version of JNA that only works with TVM runtime.

The limitations of current JNI binding(not supporting node) is mainly due to its original design goal of only supporting runtime. I think it should not be hard to add the support of the Node system, and allow other jvm based packages to depend on it to make use of the compiler itself


#8

so to summarize, I think there are great lessons that could be learnt from the clojure JNA binding to make the core tvm4j binding works for both compiler API and the runtime (while keeping the runtime minimum when user only links it), and it is also likely that we can do it once through JNI to remove a layer of abstraction and keep it minimum


#9

Regardless of the choices of FFI(JNI and JNA), I think it makes sense to discuss a proposal of core tvm4j API that is pure java and wraps the underlying library, so that it meets the need of current clojure binding as well as runtime.


#10

Awesome, that would really help and I would love to be part of that discussion.


#11

since @chrisn you have already created the clojure bindings, can you make a strawman’s proposal to change the current set of tvm4j APIs and they we start from there?


#12

Yes absolutely. I would be happy to.