TVM Monthly - Dec 2019

As discussed with TVM PPMC, we would like to give a summary of the project per month, so people can get a better sense of what is going on in the community.

Feedback and suggestions are welcomed so that we can further improve the report.

Community

The second annual TVM conference took place at the University of Washington in Seattle; the program included talks by the community (including AWS, Facebook, Alibaba, Cornell, Microsoft, ARM, Xilinx, OctoML, Qualcomm, Stanford, Intel). Videos and slides available her https://sampl.cs.washington.edu/tvmconf/

The community also welcomes new PPMC member Jared Roesch (@jroesch) and new reviewer Neo Chien (@cchung100m).

This forum got 90k pageviews, 2.7k user visits in the last month (less than November with 103k and 3.3k, mostly due to the holiday season).

Features and Improvements

In the previous month, the community released TVMv0.6, fully deprecated NNVM in support of Relay, implemented bring your own code-gen in Relay and Relay VM, enabled standardized graph module export, and extended uTVM to support its first microcontroller platform, the ARM STM32F746XX. The codebase refactor for the unified object system was carried out, several operators were added including 3D operators, INT8 GEMM performance was improved, and new schedules were added for ROCM, and ARM NHWC. The TF to Relay, TFLite to Relay and ONNX to Relay has seen increased operator coverage. A handy layout transformation pass was added to Relay. The RPC runtime was extended to support TFlite model evaluation on low-power devices. Several enhancements were added to the cycle-accurate TSIM simulator. Finally numerous bugs fixes were implemented by the community at large.

More improvements along with details are listed below.

Compiler Support

  • Add function attributes to IR hash (#4479)
  • Intrinsic dispatching with OCML instead of LLVM for ROCm (#4499)
  • IR readability enhancement (#4501)
  • Add bfloat16 typeflag support (#4525)
  • External codegen support in Relay (#4482) + VM (#4544)
  • Deprecating NNVM (#4535, #4562, #4565, #4571)
  • Cythonize NDArray.copyto (#4549)
  • Add convertlayout pass in Relay (#4335, #4600)
  • Relay passes lookup overhead optimization (#4594)
  • Unified Object System runtime refactor (#4578, #4581, #4603)
  • VM profiler: sort VM stats by time (#4601)

Operator Support and AutoTVM

  • Add strided_set operation (#4303)
  • Add shape function for zero, zeros_like, ones, ones_like (#4448), tile (#4441)
  • Add support for conv3d (#4400), pool3d (#4478), 3d upsampling ops (#4584)
  • Add group convolution for VTA (#4421)
  • Adding ROCM schedules for TOPI (#4507)
  • Add 1d deconvolution op (#4476)
  • Allow batch matmul to be fused into injective ops (#4537)
  • Add native depthtospace and spacetodepth operators (#4566)
  • NHWC conv2d schedule templates for ARM (#3859)
  • Int8 GEMM performance enhancement using Cublas (#4550)

User Interface and Frontend

  • TFLite parser support for transpose_conv (#4440), unpack (#4447)
  • LLDB pretty printers for relay (#4453)
  • ONNX to Relay converter op support: expand op (#4483), auto_pad in conv and convtranspose (#4563)
  • TF to Relay converter op support: bilinear and neighbour implementation refactor (#4504), max_pool3d (#4551), conv2d_transpose with “same” padding support for larger than 1x1 kernels
  • Remove unnecessary cast of constants in ONNX converter (#4573)

Runtime

  • Add ADTObject POD container type (#4346)
  • Add CUDNN conv3d support (#4418)
  • Update RPC runtime to allow remote module as arg (#4462)
  • TFLite RPC runtime (#4439)
  • Refactorying system lib and dso lib into library module (#4481)
  • Standardized graph runtime export (#4532)

Documents, Test, and Build

  • Adding benchmark log format doc (#4366)
  • Adding AMD codegen unit tests (#4509)
  • Add Ninja build system to installation docs (#4554)
  • Add v0.6 release (#4558)

Accelerator and Microcontroller Support

  • uTVM support for ARM STM32F746XX boards (#4274)
  • Speedup TSIM with multi-threading (#4491)
  • Improve TSIM virtual memory mapping (#4545)
  • Cleanup legacy verilog code (#4576)

Fixes

  • Doc/comment fixes (#4452, #4463, #4469, #4493, #4397, #4580, #4585, #4591)
  • MSVC / Windows fixes (#4455, #4569)
  • Fix Makefile for howto_deploy (#4457)
  • Fix GCC 4.8 compact (#4461)
  • Fix search path to build libtvm_topi.so (#4467)
  • Fix for conv2d_transpose CUDA compilation (#4472)
  • Fix for LLVM 10.0 codegen (#4480, #4515)
  • Fix alter op layout when calling global var (#4454)
  • Fix float2half_rn support for cuda compute capabilities < 53 (#4489)
  • Fix compile errors for OpenCL backends (#4492)
  • Fix serialization precision loss (#4503)
  • Fix hybrid script to support array of tensors (#4494)
  • Fix annotation for multiply op (#4458)
  • Fix Dockerfile for linter CI (#4506)
  • Fix TF resize for dynamic size models (#4510)
  • Fix bias_add gradient (#4516)
  • Fix tanH unit test function call (#4517)
  • Fix extra reshape parameter for ONNX (#4524)
  • Fix crash caused by empty TOPI config (#4520)
  • Fix ONNX shape op type to use int64 (#4528)
  • Fix crash in TSIM virtual memory driver (#4527)
  • Replace deprecated python library in setup script (#4533)
  • Fix NMS max_output_size loop (#4541)
  • Fix style in IR mutator and IR visitor (#4561)
  • Fix compiler warning (#4559)
  • Fix to get end to end inference on Chisel VTA (#4574)
  • Fix LLVM build by adding missing intrinsics headers (#4575)
  • Fix context creation in quantization (#4582)
  • Fix NDArray SaveDLTensor signature (#4586)
  • Fix dense pack schedule for x86 (#4539)
  • Fix for broadcast tensor of scalar type (#4577)
  • Datatype refactor (#4513, #4560)
  • Add const qualifiers for NDArray container (#4590)
  • Fix TF <= 1.12 compatibility (#4593)
  • Fix for graph debug runtime (#4598)
  • Disable copy constructor for external codegen (#4597)
  • Make ADT tag signed (#4605)

People Who Reviewed Pull Requests:

Note: The format is name(number of activities) Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions.

tqchen (22), zhiics (7), icemelon9 (5), masahi (5), yongwww (5), liangfu (5), inadob (5), anijain2305 (4), apivovarov (4), liangdzou (4), optima2005 (4), kevinthesun (3), FrozenGene (3), cchung100m (3), petrex (3), yzhliu (2), wweic (2), eqy (2), abergeron (2), jwfromm (2), BenjaminTu (2), kice (2), mbarrett97 (2), dmakarov (2), wyc-ruiker (2), jmorrill (2), zhuochenKIDD (2), ZihengJiang (1), MarisaKirisame (1), Laurawly (1), tmoreau89 (1), nhynes (1), kazum (1), jroesch (1), slyubomirsky (1), soiferj (1), weberlo (1), junrushao1994 (1), comaniac (1), t-vi (1), u99127 (1), kimishpatel (1), alexgl-github (1), Hzfengsy (1), vmiheer (1), reminisce (1), jackwish (1), spectrometerHBH (1), JammyZhou (1), SWu (1), lhutton1 (1), cylinbao (1), imharrywu (1), uenoku (1), HisiFish (1), leandron (1), Leo-arm (1), aksarben09 (1), tkclimb (1), abuccts (1), ElaineBao (1), qingyunqu (1), anwang2009 (1), KnowingNothing (1)

People Whose Pull Requests are Updated:

Note: The format is name(number of activities, area list)

tqchen (54), zhiics (24), kevinthesun (21), masahi (18), ZihengJiang (15), yzhliu (13), vinx13 (11), tmoreau89 (11), junrushao1994 (11), comaniac (11), wweic (9), FrozenGene (9), icemelon9 (8), yongwww (7), cchung100m (6), jwfromm (6), MarisaKirisame (5), apivovarov (5), soiferj (5), u99127 (5), optima2005 (3), srkreddy1238 (2), anijain2305 (2), liangfu (2), kice (2), merrymercy (1), Laurawly (1), nhynes (1), PariksheetPinjari909 (1), jroesch (1), Huyuwei (1), slyubomirsky (1), vegaluisjose (1), were (1), ajtulloch (1), weberlo (1), antinucleon (1), petrex (1), ehsanmok (1), Hzfengsy (1), yinghai (1), umangyadav (1), jackwish (1), yuluny2 (1), TaoLv (1), qingyunqu (1)

4 Likes