We have observed that TF object detection models have large compilation time. For example, SSD mobilenet takes around 15 min, and faster RCNN takes around 88 min on EC2 C5.9x machine (Skylake).
To pinpoint the issue, I profiled all the passes. Following is the breakdown for SSD mobilenet
#############
Top 10 passes
#############
LambdaLift 1 0.42128 0.42128
Inline 2 0.450635 0.2253175
AlterOpLayout 1 0.472836 0.472836
FuseOps 4565 2.517395652999993 0.0005514557837897027
SimplifyExpr 1 9.70095 9.70095
ToANormalForm 4562 18.614626426999994 0.004080365284305128
InferType 10143 56.48113732500007 0.005568484405501338
ManifestAlloc 3 134.5319 44.84396666666667
FoldConstant 5 478.08437000000004 95.61687400000001
EtaExpand 420 484.24537720000006 1.152965183809524
#############
Parser 194.3445200920105
Total (including parser) 947.3799571990967
Second column is number of invocations, 3rd is total time, and 4th is average time.
As you can see FoldConstant and EtaExpand take majority of time. FoldConstant is a function pass, which is called for every func in the module. Each FoldConsant called Interpreter, which calls EtaExpand. So, EtaExpand is counted twice.
The main culprit is CreateInterpreter in FoldConstant pass. CreateInterpreter makes a copy of almost whole mod. TF SSD models are pretty big, and cause performance overhead. But, the real slowdown comes from calling CreateInterpreter again and again, once for each func in the module.