End-to-end (TVM+VTA) flow tutorial with Yolo v3

thierry · July 31, 2019, 7:16am

Working on bitpacked operators would be super interesting on VTA, this is a direction I’m looking into enabling in hardware/software, but a lot of work will need to be done on training/quantization to enable it.

In terms of FPGA coverage, are there low-power FPGAs that @cbalint13 and @kevinyuan would be interested in providing preliminary support to other than the ice40? I think it might be interesting to see if we can instantiate a VTA design on an FPGA ~10x smaller than originally designed for. We could come up with interesting optimizations, or re-organizations.

cbalint13 · July 31, 2019, 2:24pm

@thierry ,

Same random thoughts on fpga targets:

Low sized FPGA would be interesting as ultra low power applications like TinBiNN showcased by Lattice or MARLANN does it on ICE40. I am confident that at least state-of-art can be achieved in terms of smallness and low power consumption. These smaller devices also have the advantage to be syntesiable end-to-end with opensource tools too, so they can become very popular if not already are like this board upduino ,even Lattice support and showcase it as third-party board. The low-power target field is still poorly covered yet by the industry, there is lot of open room.
Also Lattice ECP5 (middle-low size) series are now supported by opensource community on boards like TinyFPGA-EX, and if i am not mistaken its showcased by company like XNOR.ai here as industry’s first low power target AI applications.
On high end stand alone FPGA (with some PCIe) from Xilinx7 family are also interesting especially the affordable ones like e.g. CrowdSupply. It is large enough to experiment and also synthethisable with opensource tools soon too. Such boards can be build even as DIY with ease, no very special requirements or pricetag.
True high end ones like Ultra+ became unaccessible for many people however those can deliver real state-of-art performances (but not so sure about when compared to ASIC competitors).

thierry · August 1, 2019, 6:58am

Thanks @cbalint13 for the suggestions. It would be great to have a contributor work on Lattice tool chains support. Recently, TVM reviewer @liangfu added support for Intel (formerly Altera) FPGA SoC support. We could perhaps pick a Lattice FPGA that has microcontroller support. Thoughts?

thierry · August 1, 2019, 6:59am

I also realize that we’ve diverted from the topic of the original thread, so feel free to add a new one.

cbalint13 · August 2, 2019, 9:01pm

@thierry, @kevinyuan,

Prepared an end-to-end demo script (on CPU) here that do:

takes yolov3-tiny (can be ‘yolov3’, ‘yolov2’ but not tested)
import it to via relay graph
quantize net using KL statistics (latest PR #3854)
tune the resulting network (optional, uncomment L348), with resume support
evaluate final inference time per single frame
run demo on this video in real time on the screen.

For now is CPU only, can be adapted to VTA (help needed).

Note that frame resizing, box & other graphic overlay at display time is at orders more time consuming than inference itself, but this is ment to be a demo/tutorial at all.

thierry · August 2, 2019, 9:39pm

Very neat; this will be a great starting point to target VTA. I’ll start to take a look at the operators so we can make sure that we have proper coverage on VTA.

What are you running the demo on?

cbalint13 · August 3, 2019, 12:08pm

@thierry, @kevinyuan,

Update the script to revision 4 (works better, also tested with ‘yolov3-tiny’ and ‘yolov3’ & ‘yolov2’).
Also for local CPU there exposed a generic tuning file for each layer (no AVX2, that would be much faster).
Except video file all downloads goes automatic in the script, useful if we want end-to-end tutorial.
It is possible to use a camera instead of video, i’ll add a cfg switch for this in next revision 5 (be back).

It hits ~100ms inference time on CPU (old IvyBridge), curios on VTA how it would do on various targets (de10, pynq, ultra96).
ATM don’t have any of mentioned board but would looking forward to add support for artix/kintex7 or smaller ecp5 (cpu-less) with softcore (e.g. it could be risc-v if it is the only way).

thierry · August 8, 2019, 5:46pm

Thank you @cbalint13; I’d like to try on the pynq and VTA. Will update you when I get something running.

hzhang · October 17, 2019, 3:44pm

I’m running this demo on mac and I found that libdarknet_mac2.0.so is missing on *https://github.com/dmlc/web-data/tree/master/darknet/lib

What I’m trying to do is cloning darknet repo from https://github.com/pjreddie/darknet , build the darknet project on mac to get libdarknet.so, and rename it to libdarknet_mac2.0.so to see if it can work in this tiny yolo v3 demo. If any of you have done this before, or have any suggestions on how to run this demo on mac, please advise.

Really appreciate your help!

hzhang · October 17, 2019, 4:47pm

I’m running into this issue link and I found that download libdarknet_mac2.0.so from libdarknet_mac2.0.so will solve this issue.

thierry · October 17, 2019, 7:05pm

Ah yes, that is an issue with the tutorial. Can you fix the download path and submit a fix in a PR?

hzhang · October 18, 2019, 12:28am

Sure. And also I found some other Mac compatibility issues in the darknet tutorial code. Let me fix all of them in one PR.

hzhang · October 22, 2019, 3:59pm

Hi thierry, I just created a PR regarding the path fix, but I didn’t find a place to assign the reviewers. Could you help me with this PR? Thanks!

PS. Please ignore the other compatibility issue I mentioned in the above post. These issues are in the tiny yolov3 quantization demo, not in from_darknet.py

hjiang · July 10, 2020, 7:30pm

Hi There, Just a update, now VTA can support Yolov3-tiny, here(https://tvm.apache.org/docs/vta/tutorials/frontend/deploy_detection.html) is YoloV3-Tiny for VTA tutorial.

kilsenp · February 22, 2021, 9:12pm

Hi hijang, this link is no longer available and I cannot find the tutorial elsewhere. Does this still exist somewhere? Yolov3 to vta tutorial.

Thanks

hjiang · February 24, 2021, 7:05pm

@kilsenp, yolov3 tiny on VTA is broken after latest tvm change, tutorial get removed due to such issue and the said link not work anymore, this issue would get addressed later.

aleczhanshi · August 5, 2021, 2:22pm

Hi @hjiang, is the yolov3 tiny on VTA available now anywhere?

hjiang · August 12, 2021, 3:10am

Hi @aleczhanshi , the vta yolov3tiny issue already fixed, you can find the tutorial in this PR [VTA] Make vta graph_pack compatible with latest TVM, and bring back object detection tutorials. by huajsj · Pull Req

ptl88 · September 3, 2021, 3:38am

Hi @hjiang, thank you for the yolov3-tiny tutorial, is the yolov3 model also available on VTA? Thanks.

hjiang · September 8, 2021, 12:01am

@ptl88, there is no VTA Yolov3 tutorial available now, if you like try Yolov3 on FPGA like Xilinx UltraScale etc, Vitis backend is a choice.