Basically, we already have a WASM support with the LLVM feature and our Rust runtime. However, we have no documents about how to generate and use the WASM library with TVM, and we have no way to autotune the binary.
I have some ideas for those and would like to contribute, but I’m not sure if I’m in the correct way. I’d like to hear opinions from experts.
How to generate a deploy lib in the WebAssembly format with TVM
Building a static WASM library is easy. Make the build target ‘llvm -target=wasm32-unknown-unknown --system-lib’ and save a static lib with Module.save.
Building a shared WASM library looks challenging. clang doesn’t accept the ‘-shared’ option for the WASM target. I created the shared lib somehow, but I still got an error when I used it from Rust programs:
rust-lld: error: ../../libtvmwasm.wasm: not a relocatable wasm file
Is there any way to generate a relocatable wasm lib?
How to use the generated WASM library from other programs
I tried the following two Rust programs with the generated static lib.
Create a WASM binary with WASI and use it from wasmtime.
On my environment, Rust optimized out the link to the deploy lib because we don’t use any symbols in the lib explicitly – we call functions via PackedFunc. To avoid the optimization, I had to add a function to make it clear that we need the lib.
Create a wasCC actor and provide inference serving via HTTP.
On my environment, TVMBackendRegisterSystemLibSymbol of the lib was not called and all of the get_function() calls were failed. I had to call __wasm_call_ctors() explicitly to invoke TVMBackendRegisterSystemLibSymbol.
Auto tuning WASM binary
Currently, WASI doesn’t support networking, so it looks impossible for WASM programs to work as a RPC server. For autoTVM, we need a WASM runtime to process WASM functions in Rust or C.
[A0] Rust: We can use wasmtime crate. We also have to add RPC features support to the rust frontend. It might be easy to migrate to pure WASM in future when WASI supports networking.
[A1] C: We can implement it with WASM C++ API. I’m not sure how difficult it is, but looks feasible to me.
It looks the same to the other DL frameworks. For example, I tried ONNX.js on several environment including mobile phones, but WASM is slower than WebGL. (c.f. https://microsoft.github.io/onnxjs-demo/#/resnet50)
I guess it is because WASM doesn’t support threading natively yet. Or am I missing something for WASM optimization?
Any comments would be appreciated!