TVM Monthly - Oct 2019

As discussed with TVM PMC, we would like to give a summary of the project per month, so people can get a better sense of what is going on in the community.

Feedback and suggestion are welcomed so that we can further improve the report.

Community

The community welcomes new reviewer Jon Soifer (@soiferj) and committer Andrew Tulloch (@ajtulloch).

This forum grew healthily and got 79.1k pageviews, 2.4k user visits in the last month.

Meetups are about to be held next month in Bay Area and Shanghai.

Features and Improvements

In the previous month, the community has made good progress on several aspects. Here are a few highlights,

  • With legalize pass and operator/backend support, TVM is now able to consume pre-quantized INT8 models.
  • The community is pushing hard on enabling training via TVM. This includes more operator-level gradient support and Relay pass to insert checkpoint, etc.
  • TVM now has better support on dynamic shape. e.g., broadcast operation is able to deal with symbolic shape automatically, Tensorflow frontend now supports tensor array operators, graph dispatching has been working in progress (#4118), etc.
  • TVM node system has been refactored to unify runtime::Object and Node in AST.
  • Performance on Nvidia GPU will be improved via TensorCore support.

More improvements along with details are listed below.

Compiler Improvement

  • [NODE][REFACTOR] Refactor reflection system in node. (#4189)
  • Unify node system and object (#4161, #4115, #4128)
  • [Relay][Refactor] Rename Datatype to ADT (#4156)
  • [Relay][VM] Add more passes to VMCompiler (#4058)
  • [tvm][any] broadcast with values other than one (#3967)
  • [relay][vm] Separate VM runtime with executable (#4100)
  • [Relay] fix exponential blowup in interpreter (#3559)
  • [Relay] Fix memory leak in the interpreter (#4155)
  • [Relay] Add Python type functor and tests (#4209)
  • [rpc] use callback func to do send & recv (#4147)
  • Add lift_if_then_else pass (#3865)
  • [VTA][Chisel] TSIM VTA Source Refactor (#4163)
  • [Relay][Training] Add missing gradient check to gradient pass (#4169)
  • [ARITH] Use floordiv for the deduce bound (#4025)
  • [Simplifier] Rewrite simplification rule to eliminate unnecessary conditionals. (#4076)
  • Optimizing autotvm task extraction speed (#4138)
  • dicrease the complexity of CalcDep from exponential to linear (#4053)
  • [IR] Make iterators compatible with constructors of STL containers (#3624)
  • Force code_object_v2 for amd gpu backend (#4099)

Quantization

  • Infrastructure to support pre-quantized models (QNN) (#3971).
  • [Relay][AlterOp] NHWC to NCHWc support for Pool, concatenate, sum. (#4059)
  • [QNN][TFLite] Parsing QNN Add op. Adding MobilenetV2. (#4142)
  • [TOPI][x86] Cascade lake support. (#4123)
  • [TOPI][x86] Legalize - Support int8xint8 convolution to use VNNI inst (#4196)
  • Qnn dequantize with min max using Mxnet flavor to support Mxnet prequantized models. (#3945)
  • Improve the lowering of Qnn Dense (#4213)
  • Adding support for dequantizing from int32 to float32. (#4130)
  • [QNN] Refactor fixed point multiplication in requantize (#4073)
  • [Relay][Quantize] Use fixed point mulplications (#4160)
  • Add support for quantized multiply to Relay (#4141)

Performance

  • [TOPI][X86] Pool operator parallel support. (#4090)
  • Improve layout for several operators (#4103, #4040, #4080)
  • [Relay][VM] Fix constant folding issue in VM compiler (#4077)
  • [relay][vm] Reuse allocated device memory (#4170)
  • [Runtime] Enable option to use OpenMP thread pool (#4089)
  • [PERF] Parallelize reduction for CPU (#4158)
  • [TOPI] Tunable Template for Conv2D HWCN on CUDA (#4168)
  • [TOPI] Add valid auto tvm for Intel Graphics (#4078)
  • [TOPI] FIFO buffer op, to accelerate sequence modeling with dilated convolutions (#4039)
  • TensorCore Support using Intrinsic (#4136)

Operator Support

  • [Relay][Frontend][TF] Add tensor array ops (#3798)
  • [TF][Op] Op where (#4045)
  • [TOPI]Add op argwhere (#3994)
  • [Relay] crossentropy_with_logits and its gradient (#4075)
  • Add gradient for log-softmax (#4069)
  • [Relay][Training] Add gradient for Crossentropy (#3925)
  • [Relay][Pass] Count MAC for BatchMatMul (#4157)
  • [topi] enable fp16 sort for arm (#4084)
  • [Relay][Training] Add and fix gradients (#4126)
  • [Relay][Op] Enhance Upsample Operator to support float scales (#4206)
  • [Relay][Op] Add instance norm op (#4004)

User Interface and Frontend

  • [QNN][TFLite] Parsing TFLite quantized models. (#3900)
  • [relay][frontend] clean up tf frontend (#3710)
  • [Relay][Topi][TensorFlow][ONNX][Lang] Add support for Any op (#4205)
  • [Relay][Frontend][ONNX] Add support for op Where (#4184)
  • [Relay][TopHub] Add switch to disable TopHub download (#4015)
  • Add parser support for CAST tflite operator (#4096)
  • Add parses support for zeros_like tflite operator (#4042)
  • Add parser support for SUM tflite operator (#4182)
  • Add support for tf.assert (as no-op) and tf.no_op to TF Relay frontend. (#4172)
  • [Relay][Params] Add APIs for storing and retrieving parameters from individual functions. (#4194)
  • Add build_create_shared_func to tvm/contrib/cc.py (#3840)
  • [Relay][Frontend][ONNX] New Operators and Opsets to Support BERT (#4197)

Language, Runtime and Hardware Support

  • [RUNTIME] Separate runtime related contrib into runtime/contrib (#4207)
  • [topi] add ARM v8.2 udot (uint8) support (#3978)
  • [VTA][TSIM] Serial GEMM Application Added (#4082)
  • [codegen] Add multiple operands and function support when using fp16 compilation (#4056)
  • [TOPI] Added support for Mali Bifrost target (#4047)

Documens, Test, and Build

  • [CI] Pin NNPack pthreadtools version (#4152)
  • [TOPI] Fix flaky testcase for check round (#4211)
  • [CI] Move gpu docker binary to cuda10 (#4229)
  • [CI] use llvm9 for the gpu tests (#4224)
  • [CI] Update GPU docker to cuda10 (#4228)
  • [Relay] Install Relay Prelude program in package install (#4227)
  • [Relay][pass] fix some docs for Relay passes (#3767)
  • [tutorial] Relay pass infra tutorial (#4083)
  • [relay] use time_evaluator for measurement (#4191)
  • [DOCS] Add TensorFlow frontend docs (#4154)
  • [Relay] Improve build error when no lowered funcs are produced (#4132)
  • [llvm] switch to use Align for llvm trunk (#4051)
  • [CUDA] Update have_int8 condition to run on compute capability 7.x devices (#4214)
  • [DOCKER] Pin torchvision==0.4.1 (#4140)
  • [DOCKER] torch install depends on future package (#4098)
  • [CodeGen] Disable -mfloat-abi hard option for LLVM < 6.0 (#4071)
  • Add a python how to example of deploying tvm module with tvm runtime only (#4094)
  • Hide symbols from dependent libraries if HIDE_PRIVATE_SYMBOLS is ON. (#4041)
  • Tutorial: update Building a Graph Convolutional Network tutorial (#4060)
  • [Docs] Add dependency of compilation with LLVM (#4117)
  • [Documentation]Fix example code in comment of tvm.build_module.build() (#4195)
  • [BUILD] Disable utvm standalone runtime by default (#4240)

Bugfix

  • [PYTHON] Fix installation for generated grammar (#4223)
  • [Bugfix] Fix target host for vm compiler (#4057)
  • [Fix][VM] Fix VM invoke with set_params (#4079)
  • [Fix] Fix a few bugs when dtype is fp16 (#4088)
  • [Relay][Frontend][TF] Fix Size operator (#4175)
  • [cmake][ANTLR] Support setting path to ANTLR jar (#4176)
  • Fix infer type of kernel in dense. (#4125)
  • [Relay] Fix match case in Python-side expr functor (#4037)
  • Split adaptive_pool2d_avg into sum and div (#4186)
  • [AutoTVM] Fix Split Factors when no_tail is off (#4044)
  • Fix extent one for the post_stmt in loop partition (#3734)
  • [TOPI] Fix bug in intel graphics auto tune (#4093)
  • [ARITH] Fix lowering of floormod(x, y) != 0 (#4127)
  • [ARITH] Fix the rule y < x && x <= y (#4220)
  • [Bugfix][TF] reset graph after getting tag of savedmodel (#4055)
  • [Fix] Fix the logic of the number of nodes checking in op fusion (#4074)
  • [VTA] hotfix for de10-nano driver (#4081)
  • Fixing tensor not found issue in bitserial operator (#4095)
  • Fix wrong n_trial number in autotvm tutorials’ progress bar if n_trial is larger then config space. (#4070)
  • [PATCH] Fix undefined __floatdihf in libtvmruntime.so on aarch64. (#4119)

People Who Reviewed Pull Requests:

Note: The format is name(number of activities) Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions

tqchen (73), zhiics (45), tmoreau89 (26), anijain2305 (23), icemelon9 (20), MarisaKirisame (18), yzhliu (18), wweic (18), vinx13 (17), yongwww (16), kevinthesun (15), jroesch (14), soiferj (11), FrozenGene (8), jwfromm (8), junrushao1994 (7), u99127 (7), cchung100m (6), merrymercy (5), ajtulloch (5), jackwish (5), ZihengJiang (4), Laurawly (4), weberlo (4), masahi (3), eqy (3), slyubomirsky (3), apivovarov (3), yidawang (3), shoubhik (3), srkreddy1238 (2), vegaluisjose (2), were (2), SWu (2), siju-samuel (1), nhynes (1), PariksheetPinjari909 (1), mshawcroft (1), hlu1 (1), liangfu (1), huajsj (1), cbalint13 (1), comaniac (1), grwlf (1), denis0x0D (1), kimishpatel (1), yinghai (1), yuluny2 (1), lly-zero-one (1), t-vi (1), zxy844288792 (1), broune (1), jianyuh (1)

People Whose Pull Requests are Updated:

Note: The format is name(number of activities, area list)

anijain2305 (19, quantization), tqchen (18, compiler, ci), wweic (12, vm, relay), zhiics (10, relay, vm), MarisaKirisame (7, relay, training), icemelon9 (7, vm, performance), soiferj (7, frontend, op), shoubhik (6, quantization), weberlo (5, vm), yzhliu (4, tvm4j, topi), kevinthesun (4, compiler, topi), comaniac (4, topi, autotvm), vinx13 (3, quantization), mshawcroft (3, build), cchung100m (3, frontend), BenjaminTu (3, vta), inadob (3, frontend), altanh (3, training), umangyadav (3, compiler), zxy844288792 (3, op), mbarrett97 (3, topi, build), broune (3, frontend), Laurawly (2, topi), jroesch (2, relay), apivovarov (2, build), ajtulloch (2, arith), sgrechanik-h (2, arith), spectrometerHBH (2, doc), liaha (2, autotvm, doc), yaochengji (2, relay), merrymercy (1, compiler), siju-samuel (1, rpc), nhynes (1, sgx), tmoreau89 (1, community), eqy (1, app), slyubomirsky (1, relay), yongwww (1, frontend), liangfu (1, runtime), yidawang (1, relay), huajsj (1, vta), jwfromm (1, frontend), cbalint13 (1, topi), petrex (1, compiler), u99127 (1, test), gussmith23 (1, doc), hcho3 (1, topi), reminisce (1, compiler), tristan-arm (1, quantization), paddyhoran (1, rust), Hzfengsy (1, cuda), bindog (1, op), ndl (1, build), jackwish (1, topi), arangrej (1, topi), dati91 (1, doc), cylinbao (1, doc), jianyuh (1, topi), zhiqiu (1, doc), monkeyking (1, doc), ekalda (1, quantization), lhutton1 (1, codegen), optima2005 (1, doc), ZQPei (1, codegen)

2 Likes