[RFC][VTA] VTA HW/SW Refactor

thierry · March 12, 2020, 5:57pm

So far the vta directory of the TVM project contains two logical parts – the software components (runtime, JIT, compiler support, TOPI library) and the hardware components (hardware sources, FPGA compilation scripts, drivers, under vta/vta-hw). The hardware and software components communicate through a relatively stable ISA spec and driver API.

In this RFC, we would like to propose moving the hardware components (vta/vta-hw) into a separate repo, the main reasons being:

This separation provides a clear split between hardware vs. software components as two distinct “products”. Such separation alleviates the concerns of organizations who want their contribution to be scoped exclusively as software changes or the hardware changes. This will help us attract broader participation from organizations that care about isolating their contributions to software-only or hardware-only components of TVM.
The main compiler and runtime support in VTA will continue to exist in the TVM repo, with the same CI and toolchains setup; this is to ensure that we can evolve the compiler toolchain quickly and build first class compiler support for modern accelerators.
The new repo (incubator-tvm-vta) will contain the hardware component of TVM, the same governance by the Apache TVM community. The contributions to either repos are viewed as contributions to one and the same project. This would also give the community more flexibility to try out new hardware design variants.

Please share your thoughts.

thierry · March 13, 2020, 5:28pm

@vegaluis @liangfu @pasquale

pasquale · March 12, 2020, 11:06pm

I think I would prefer maintaining the current structure which reflects TVM being a vertical e2e stack, from SW to HW targets. In this perspective I view the separation of SW and HW as conducive to old mentalities where SW and HW are separated and with different concerns, rather than favoring co-optimization.

thierry · March 13, 2020, 12:38am

Thanks for the valuable input @pasquale; I 100% agree that the strength of TVM and VTA is that they present a full stack compiler and hardware design (with cycle accurate sim) to perform deep learning co-design R&D, which is really quite powerful and unique! We don’t want to break that synergy between hardware and software in any way moving forward.

As such, VTA will always be an integral piece of TVM; e.g. in the TVM CIs the VTA backend (with TSIM) will always be exercised and tested to avoid bitrot from TVM modifications.

To reiterate, the key benefit is to gain new contributions from companies or organizations that might have IP that conflicts with TVM’s hardware sources. Isolating VTA’s hardware into a separate “product” will allow for more backends to be added in TVM, which greatly benefits the community as a whole.

Another way to view this re-org is that it can also incite people to upstream their TVM back-end support to open source hardware designs like RISC-V processors or an NVDLA. Next to the vta-hw submodule in TVM, we can have a collection of open source hardware submodules while the compiler support lives inside of TVM. These new backends can be tested with TVM CIs to ensure that these hardware backends are supported in the latest TVM. This prevents fork divergence, and project fragmentation when people want to add new backends to TVM.

A final benefit is adding expensive hardware / FPGA CI tests in the vta-hw repo that would be too expensive to run in TVM to validate every new PR. This can ensure that our hardware sources are thoroughly tested moving forward since hardware contributions are a lot less frequent than software contributions by 1-2 orders of magnitude.

liangfu · March 13, 2020, 6:31am

@thierry Thanks for proposing the RFC. I fully support such change, and totally agree with the benefits that you have mentioned above.

I have some concerns in following aspects:

With separation of the repos, I think changes to the VTA hardware should no longer trigger all TVM CI tests, at least tests with GPU backends are no longer necessary. This way, we can extend the test coverage of VTA hardware without consider too much about testing burden in CI.
As we always value contributions to the code base, splitting the repo might break the authorship in the commit history. How would we deal with that?
Would you please kindly clarify whether the new repo would available under github.com/dmlc or github.com/apache ?

liangfu · March 13, 2020, 6:40am

@hjiang might be interested in this topic as well.

tqchen · March 13, 2020, 4:07pm

To specifically discuss the point raised by @liangfu.

I think it makes sense to bring more CI coverages to the VTA hardware. Given that the proposal makes the new repo under the same community, it will be apache/incubator-vta-hw. We can use git history mechanism make sure that the author commit history is preserved in the new repo, which is possible.

thierry · March 13, 2020, 5:33pm

Thanks @liangfu for the input, and good call on adding @hijiang to the discussion, I couldn’t remember his discuss username since it doesn’t match with his github

Regarding your points:

We can be more selective about which tests to run in vta-hw tests indeed and run more exhaustive testing on hardware, while skipping many TVM tests that would unnecessarily lengthen the runtime of CI.
We won’t break authorship commit history during the transition and make sure every author has its history preserved!
And as Tianqi mentioned the repo will remain under Apache TVM; so all roles remain the same and contributors to hw-vta are offered opportunities to keep their roles and get promoted within the Apache org.

hjiang · March 14, 2020, 9:09pm

Hi @liangfu , Thanks for adding me into the thread ,this is a very interesting topic for sure.

Hi @thierry,

to “attract broader participation from hardware orgnization” is a very good idea, and I totally support related effort. About how HW/SW refactor will benefit such goal I have couple question want to ask for better understanding , and think we may have some issue need to address for better help hardware vendor.

#1. about “care about isolating their contributions to software-only or hardware-only components of TVM”, I am not very clearly understand, if we do opensource why would care about it is HW contribution or SW contribution?

#2. After refactoring vta into vta-hw repro, once we have a new HW backend that have both software and hardware module, do we need to separately upstream to tvm and vta-hw? what is the rule to split them like where simulator go etc? is there a better way to reduce developing effort for two repro maintain?

#3. after separate VTA, how to do the unit testing once the change happen in both SW/HW side? for example VTA ISA instruction get add some new field, and need both vta runtime and TSIM/simu/fpga change to apply such change.

#4. after vta become vta-hw, is there any plan to provide some high level abstraction to hide the pass lower and parameter packing detail to reduce the new backend developing complexity?

Regards

Hua

thierry · March 16, 2020, 7:27pm

Hi @hijiang, thanks for providing your input, there’s a lot of great questions!

#1 - While it won’t matter to most contributors either from academic or industry backgrounds, some companies have larger concerns about how they make contributions to open source projects. While making contributions to software projects is a well known and understood topic, it is less so for open source hardware. Some companies’ lawyers want to be overly cautious and want their company to exclusively contribute to opensource SW projects hence the need to isolate all of TVM’s hardware sources out of TVM main repo.

#2 - Yes, when upstreaming changes to both tvm and vta-hw, there will be steps required to take to make sure the changes don’t break the CIs, and that both versions can be updated correctly. The simulator bits will have to be in vta-hw. To run the CIs in vta-hw, we’ll need to rely on TVM in some unit tests, so the CI Dockerfiles in vta-hw will need to point to an up to date version of TVM. In TVM, we’ll also exercise the VTA backend in the CI. So essentially if a change requires modifications of both TVM and VTA:

Create the TVM fork, and create the VTA-HW fork.
Make your TVM fork use your VTA-HW fork as submodule, and demonstrate that CIs are green.
In your VTA-HW fork, update the version of TVM to your own fork in the Dockerfile if needed to show that CIs in VTA-HW fork pass (this only is necessary if you have to change the hardware/software interface and APIs).
Upstream changes of VTA-HW, with CIs being green.
Upstream TVM changes, and update VTA-HW submodule version.

This is really in the worst case scenario where changing VTA-HW in isolation would break TVM CI, and where updating TVM in isolation would break VTA-HW CI. I’ll make sure to update the TVM docs, and provide these steps whenever someone creates a PR in VTA-HW.

#3 - We’ll have CI tests in both, and some will have overlap. The VTA-HW tests will be more extensive; for instance we can run longer simulation, and also run bitstream compilation to catch errors that would break FPGA toolchains. On the TVM side, we can scale down the duration of unit tests that rely on TSIM so accelerate CI time.

#4 - I think we should aim to reduce complexity even further of the current codebase; and make VTA less of a special case backend and integrate it more as a “normal” TVM backend. I’m open to see more RFCs proposing cleaner cuts between TVM and VTA.

Thanks,

Thierry

liangfu · March 17, 2020, 4:27am

A lot of great discussions so far, thanks to every one’s input.

I think there would be a lot of great stuff, notably

We won’t break authorship commit history during the transition
… it will be apache/incubator-vta-hw
… we can run bitstream compilation to catch errors that would break FPGA toolchains.
… make VTA less of a special case backend and integrate it more as a “normal” TVM backend (, and possibly, VTA-HW could the backend for other DL compilers)

I think all the answers make sense, and all of my concerns have been covered. I fully support such change.

thierry · March 17, 2020, 4:45pm

Great, I’ll open the discussion for two more days! @hjiang @pasquale @vegaluis let me know if there are more questions I can help clarify.

thierry · March 19, 2020, 7:12pm

Discussion period is over; moving over to a vote: https://github.com/apache/incubator-tvm/issues/5102