@kevinthesun @haichen let’s continue the discussion on https://github.com/dmlc/tvm/pull/4129 here if you wish.
I want to clarify that this is not a half-baked solution: this technique is known as polymorphic inline cache, and has been known to exist for a long time. It also cover most dynamic use case.
If we count all the model tvm support (by looking at testing, seeing the benchmark, the tvm paper, or the relay paper), most of them are vision (densenet, resnet, dcgan, vgg, mlp). they all have a batch parameter. some of the rest are lstm and treelstm, or slight modification of them (rnn, gru, bilstm, which is a concatenation of two lstm), and there are significantly less nlp model then vision model. Finally, in the nlp model, there are a few model that sort of exist - (transformer, bert). Due to various technical reason surrounding vm and any, they are mostly in developement and had not been in public yet.
I failed to see why not supporting some model that doesnt exist in tvm yet, but supporting training of all current vision model, is a limited approach.
I also disagree that it will be replaced soon. Suppose graph dispatching comes out tomorrow, this still has advantage over graph dispatching - this will give the utmost performance for more then half of the model (the vision). For the rest, half of them wont be effect, and a small minority of them (bert) will this be worse then graph dispatching.
Another suspicion is, I had looked at graph dispatching, and judging by the RFC, it has to made nontrivial change to Relay, topi, and possibly autotvm. Do you guys have an estimate of how long it will take?