TVM Monthly - July 2019

As discussed with TVM PMC, we would like to give a summary of the project per month, so people can get a better sense of what is going on in the community.

Feedback and suggestions are welcomed so that we can further improve the report.

Community

The community welcomes new committer Zhi Chen (@zhiics) and new reviewers Steven Lyubomirsky (@slyubomirsky).

This forum grew healthily and got 67.3k pageviews, 2.3k user visits in the last month (from 58.9k and 2.0k the month before).

Features and Improvements

In the previous month, the community has been working on improving the compiler infrastructure including pass manager (with doc, and tutorial), integer-set analysis and VM runtime. New efforts on automatic differentiation, symbolic shape support, and dense-sparse operators are paving the way to new deep learning workloads. We improved both the coverage and performance of operators for various frameworks (TFlite, TF, Keras etc.), various backends (ROCM, CUDA, x86 INT8, Vulkan etc.), with various optimizations (Winograd, block-sparse etc.). Relay has seen improvements in debuggability with better IR printing. In terms of custom accelerator support, VTA is can now be compiled to Intel FPGAs, and the runtime has been updated to support PCI-E based FPGA cards. The Chisel version and the TSIM compiler are becoming more mature. Last but not least, micro TVM has been merged, and will pave the way towards minimal deployments on bare-metal microcontrollers.

More improvements along with details are listed below.

Compiler Improvements

  • Migrate Relay passes to pass manager (#3430, #3312)
  • Transitioning low-level IR away from HalideIR (#3533, #3535)
  • Integer set analysis/simplifier improvement (#3368, #3503, #3504 , #3502, #3479 , #3568)
  • Symbolic shape support (broadcast op #3389)
  • Higher order continuation passing style (#3456, #3485 )
  • Tags for ADT constructors (#3369)
  • Frontend changes (get_workload - #3483)
  • IR dumping for debugging (#3493)
  • Pretty printer and parser roundtrip (#3460, #3536)
  • Vulkan IR builder (bool to float #3513)
  • Relay type checking (conv2d weight dimension #3511, any shape #3221)
  • Relay gradient registration (clip #3509, max_pool2d and avg_pool2d #3601)
  • Relay VM (pattern matching #3470, port to python #3391, serialization #3647)
  • Relay Module enhancements (remove free variables #3476)
  • LLVM DWARF debug information (#3420)
  • Printer for Layout/BijectiveLayout (#3582)
  • Add op size (#3094)
  • Relay AD algorithm (#3585)
  • Relay Training - allow gradient to return a tuple (#3600), numerical gradient check (#3630)
  • Type inference escape hatch (#3571)
  • Making iterators compatible with constructors of STL containers (#3624)
  • ChangeBatch pass for batched VTA compilation (#3656, #3660)
  • ROCM support (llvm printing #3662, ld.lld finding #3664, save to file #3665)

Operator Support and AutoTVM

  • Mac count support for conv_2d_transpose (#3469)
  • x86 TOPI (roi_align #3475, conv2d_transpose #3491 )
  • Intel INT8 (dilation in conv2d #3510, type checking #3516)
  • Graph tuner for multiple subgraph (#3490)
  • Reinterpretation of tensor elements (#3599)
  • Spase-Dense for block-sparse multiplication (#3566)
  • Winograd matrix computation (#3553)
  • CUDA schedule for pool_grad (#3622), group_conv2d (#3663)

User Interface and Frontend

  • TFLite operator support (pack #3521, split #3520 )
  • Keras operator support (permute, softmax #3618)
  • TF operator support (BatchMatMul #3634)
  • TF fix where output index is ignored (#3622)

Runtime

  • Android runtime argsort support (#3472)
  • GraphRuntime enhancements (set_input_zero_copy #3416)
  • Threadpool: make spin_count configurable (#3577)
  • RPC worker children termination (#3669)

Documents, Test, and Build

  • Docker update conda package (#3344), requests and pillow (#3495), Android demo (#3499), rat install (#3527), ARM support (#3546), LLVM (#3590)
  • Tutorial migration to Python3 (#3498)
  • Android RPC README (#3500)
  • Relay-to-Python testing (#3156)
  • Code refactoring/remove (#3523, #3667)
  • Documentation for Relay opcode (#3522)
  • Tutorial for pass manager (#3515)
  • Temporary CI disabling (#3569)
  • Minimum version of Python in docs (#3588)
  • Relay pass infra (#3583)
  • Zero-rank testing (#3612)
  • CMake compilation (#3611, #3650, google test #3628)
  • X86 Autotune tutorial improvements (#3609)
  • Standalone wheel build for TOPI (#3657)
  • YOLOv3 tiny Darknet tutorial (#3674)
  • SSD doc to avoid confusion (#3677)
  • Fixing performance issues in PassUpDomain when fusing and splitting axes (#3073)

Accelerator and Microcontroller Support

  • VTA fast simulator statistics (#3481)
  • TSIM improvements and fixes (#3505)
  • Chisel VTA enhancements and fixes (32bit support #3558, alu instruction generation #3592, coherence support #3593, separate types #3605, tensor issue/commit #3637, uop load request #3643, uop dma requests #3654)
  • VTA Runtime refactor for non-shared memory FPGAs (#3590)
  • MicroTVM (#3227)
  • VTA HLS codebase refactor for Ultra96 (#3496)
  • VTA support for batched inference (#3661)
  • VTA bitstream compilation for Intel FPGA (#3494)

Fixes

  • Runtime fix for custom datatypes (#3471)
  • Relay build module warnings (#3452)
  • Relay partial evaluator (#3482)
  • Pynq AutoTVM tracker (#3497, #3578)
  • A normal form test (#3525)
  • Lint issue (#3519, #3615 )
  • Any shape testing (#3528)
  • Android posix_memalign (#3532)
  • Quantization add_rewrite and UnifyDTypeScale (#3534)
  • Bound inference fix (#3526)
  • Tensorflow NCHW data format (#3514)
  • First order gradient (#3550)
  • JS load module example (#3556)
  • Build error (#3552)
  • Relay VM debug statements (#3565)
  • C++ lambda expr (#3570)
  • Handling of tempdir if subprocess is killed (#3574)
  • Remove tabs in Chisel source (#3603)
  • Relay VM DataTypeObject (#3604)
  • Removing prints (#3616)
  • Average Pool2D Bug (#3607)
  • Missing header in cuda_device_api.cc (#3621)
  • Tensorflow frontend fix where output_shape is None (#3632)
  • Winograd accuracy fix (#3644)
  • Fix comment (#3646)
  • Zero-input op fix for recursive traversals (#3623)
  • Python 3.5 compatibility (#3675)

People Who Reviewed Pull Requests:

Note: The format is name(number of activities) Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions.

tqchen (101), jroesch (31), tmoreau89 (24), yzhliu (23), zhiics (23), wweic (20), MarisaKirisame (14), kevinthesun (14), vinx13 (13), slyubomirsky (12), junrushao1994 (12), FrozenGene (11), masahi (10), liangfu (10), merrymercy (9), icemelon9 (9), eqy (7), vegaluisjose (7), u99127 (6), ajtulloch (5), sgrechanik-h (5), grwlf (4), ZihengJiang (3), srkreddy1238 (3), mshawcroft (3), hlu1 (3), apivovarov (3), yidawang (3), antinucleon (3), weberlo (3), anijain2305 (3), kazum (2), yongwww (2), denis0x0D (2), siju-samuel (1), Laurawly (1), Huyuwei (1), were (1), cbalint13 (1), huajsj (1), yinghai (1), gussmith23 (1), Mutinifni (1), reminisce (1), kaitingwang (1), altanh (1), Hzfengsy (1), kparzysz-quic (1)

People Whose Pull Requests are Updated:

Note: The format is name(number of activities, area list)

MarisaKirisame (43, relay), tqchen (17, arith), vegaluisjose (11, vta), anijain2305 (9, qnn), zhiics (8, relay), tmoreau89 (8, vta), wweic (8, relay), vinx13 (6, topi, relay), ajtulloch (6, relay, topi), u99127 (6, relay), icemelon9 (5, vm), jwfromm (5, relay, topi), jroesch (4, relay), kevinthesun (4, autotvm), yongwww (4, relay), ruslo (4, android), abergeron (4), weberlo (4, utvm), cbalint13 (4, model), slyubomirsky (3, relay), were (3, tvm ir), lixiaoquan (3, tf), sgrechanik-h (3, allocator), liangfu (3, vta), t-vi (3, rocm), ZihengJiang (2, quantize), yzhliu (2, codegen), nhynes (2, sgx), yidawang (2, topi), huajsj (2, vta), yinghai (2, runtime), vv1133 (2, tf), merrymercy (1, ir), srkreddy1238 (1, resize), siju-samuel (1, rpc), Laurawly (1, doc), eqy (1, quantization), mshawcroft (1), hlu1 (1), zhreshold (1), xqdan (1), FrozenGene (1, tf), hcho3 (1), kaitingwang (1, relay), altanh (1, relay), BenjaminTu (1, vta), Hzfengsy (1, arith), YPBlib (1), bulanova-huawei (1), ghostplant (1), kparzysz-quic (1, runtime), peterjc123 (1), sf-wind (1), tristan-arm (1), zacario-li (1)

4 Likes

Hi @thierry @vegaluis @liangfu @hjiang

In the Feature and Improvements part it’s mentioned that the vta runtime has been updated to support PCI-E based FPGAs. However, there is a recent discussion about adding the support, in the following link.

That would be great if you could share the recent status of this.

Thank you

Hi @elnaz92, there were some changes in the VTA runtime that would pave the way towards easier support of PCI-E based FPGAs, namely recognizing the differences between shared memory vs. off of PCIE-DMA use cases. The rest of the support is not there, but as you found this RFC should bring some of the necessary changes to get an end to end story on cloud FPGAs.

1 Like