D2L-TVM: A TVM introduction book

mli · October 11, 2019, 4:45pm

Hi all,

You may know that we are adding numpy-compitable operators to MXNet. We asked interns (most of them are undergraduate students) to implement these operators based on TVM. The feedback we got is that the documents on tvm.ai are not easy to follow for beginning developers.

About 2 months ago I started to write notebooks for these developers, by assuming them only know numpy before. Its format likes to other D2L project such as https://d2l.ai: every chapter is a runnable notebook that can be read in 10min, with all necessary background information. All these notebooks should be consistent and can be printed as a textbook.

My progress isn’t pretty fast since my time is quite limited, so far there are 17 notebooks on http://tvm.d2l.ai/. I expect the first release version should have at least 40 chapters. So it’s still a very early draft.

There aren’t new contents in the repo, most of them are already covered in tvm.ai or the topi folder. This project is more like a rewrite of these contents for beginning developers.

I’d like to hear your suggestions about this project, or it’s even better if some of you are interested to contribute.

Thank you
Mu

comaniac · October 11, 2019, 5:39pm

Using notebooks for tutorials seems promising and attracting!
One suggestion in my mind is that if we are going to ask the community to maintain this book, we should use it to replace the current tutorials to avoid duplicated efforts.

thierry · October 11, 2019, 5:38pm

Awesome work! It’s great to see the tutorials presented in a form that makes them even more easy to digest for newcomers.

Perhaps going even further, it would be great if the notebooks could be run out of the box in a colab instance. This would require pre-built docker images with a somewhat up to date version of TVM.

Regarding the tutorials, I agree with @comaniac, minimizing duplication is important. Would you be open to having the sources of the introduction book on the Apache TVM repo, rather than a separate one?

mli · October 11, 2019, 6:56pm

Supporting colab is a good idea. We tried it for other d2l projects, but for obvious reasons we didn’t put it on the homepage.

Based on our previous experience, we should have a single owner who can guarantee all chapters are consistent in math, codes and styles. The quality bar for a textbook is way higher than a bunch of tutorials. If someone is interested in it, please ping me.

The source codes are markdowns files (without cell outputs) and svg files. it’s possible to have them in the tvm repo. The issue is on the CI: 1) I hope to use a stable release instead of the master branch of TVM, 2) a lot of PRs are fixing typos and polish english, it’s unnecessary to trigger other tests, 3) the repo need to be compiled on a particular instance, e.g. EC2 P3, with some special setup, such as connected to some edge devices.

Here is one way to integrate the doc: every time to build tvm.ai docs, the CI pulls the jupyter notebooks from http://tvm.d2l.ai/d2l-tvm.zip and adds them into the source codes. The reason is that the notebooks in the github repo don’t contain outputs, while this zip file have all output that are tested by the CI.

ziheng · October 11, 2019, 7:36pm

A textbook-like material would definitely help beginners understand TVM and grow the community!

Have you considered listing topics the book want to cover, just like a road map issue, so that contributors can pick tasks they are interested? We can have a small group of reviewers to guarantee the quality of context.

tqchen · October 11, 2019, 8:29pm

Thanks for the initiative! A good lesson that the community could learn from are textbook examples of rust https://www.rust-lang.org/learn It would be great if the lessons we learnt could flow back to the current docs and tutorials.

The concern to create a separate repo is valid. I think the main thing we want to figure out is the governance model around the book project. It would be great to also follow the Apache way of governance (e.g. give commit access committers, license of the content) and put the repo in a neutral place

It would also be useful to clarify in the beginning whether:

Treat the book project as part of the core project
Treat it as a separate project that and enhances the TVM ecosystem, which means it could has its own github org etc.

Either way it would be great to have a neutral repo and clear way of Apache style governance that a broader community could participate.

In the meanwhile, we could use the lessons to help improve the current docs and tutorials as well.

mli · October 11, 2019, 8:44pm

The governance model is a good question. The Apache model may work. The difference here is that it aims for a book. A book should has a clear author list, e.g. the rust book

by Steve Klabnik and Carol Nichols, with contributions from the Rust Community

I expect we can have 2-4 developers who spend a significantly amount of time on it to be the authors, and the authors will maintain the book (committer/PPMC in the Apache model). Similar to other D2L books, the book will be open to download and use, and authors may not receive remuneration to control the physical book price.

If there is a TVM org, it should be straightforward to have the repo to be part of the org. But we have a same problem as MXNet. It’s not a good idea to have the book source within the TVM source code repo. Hosting it under Apache may complicate the copyright issue, which needs to be resolved if printing the book. My current thought is moving the repo to the d2l-ai org once the legal review is finished. There are several other projects on-going for d2l-ai as well with collaborators among multiple universities and companies, it moves towards a neural organization.

mli · October 11, 2019, 8:47pm

That’s a good suggestion, will do it.

FrozenGene · October 12, 2019, 2:13am

Nice work! I am a fans of Dive into Deep Learning. I think I could contribute some part of Dive into Deep Learning Compiler

wda · October 12, 2019, 7:05am

Dive into Deep Learning is very very good

enginechen · November 16, 2019, 3:20pm

 I'm confused. Speical hardware accelerator need  specific compiler to generate hardware instructions.    Your Compiler just generate instructions run on X86 OR Arm ?  Does it run very slow and  Inefficient？  I'm still not get your point.  how other hardware platform support your platform.

sunlex0717 · November 16, 2019, 5:42pm

I think TVM is not a actual compiler, it is built on lower-level real compiler like LLVM and NVCC(CUDA), those lower-level compilers will translate the outputs of TVM to machine codes.

MarisaKirisame · November 19, 2019, 8:36pm

Compile dont has to compile to machine code.

comaniac · November 19, 2019, 9:06pm

Agree with @MarisaKirisame. The definition of compiler from Wiki:

A compiler is a computer program that translates computer code written in one programming language (the source language) into another language (the target language). The name compiler is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language, object code, or machine code) to create an executable program.

As a result, any software that performs codegen can be a compiler.