[TVM Compiler] Performance measurement for TVM compiler pipeline

Hi all,
recently I was running huge model which I was mentioned in this topic [TVM Compiler] “stack overflow” happens on Linux x64 when computation graph is “big”.
I’ve noticed that TVM spends approximately ~30% of compilation time inside some part of the scheduler tvm::schedule::MakeBoundCheck.
Seems like some efforts could be applied to improve compilation speed, if someone is interesting in this area.
BTW, I’m not able to upload file with callgraph to this forum, here is a link https://github.com/denis0x0D/ml_models/blob/master/call_graph.png

The steps to reproduce manually:
$ valgrind --tool=callgrind python3 lstm.py
$ kcachegrind callgrind.out.number
Thanks.

I need to add that we have measured the compilation time for different model sizes. We think that in recent TVM versions it scales in a non-linear manner (quadratically?) with increasing height of model graph.

The code for reference is a naive LSTM approach https://gist.github.com/grwlf/3a8190e2d1ea4bb9bcb021a2b50b7e27
where num_timesteps is the number of LSTM cells to unroll.

This is indeed interesting and would be nice to see what is the cause of the problem