Recently I noticed two performance regression for x86 targets.
After Analyzer CanonicalSimplifier commit 7afbca5691fdb599cd90b043d5a5036e55cae2d6, exec time for gluoncv ssd on c5.9x goes up from 30 ms to 40 ms.
After profiling op, I noticed the following fused ops is much slower:
fused_strided_slice_greater_cast_concatenate_ones_like_multiply_where_strided_sl_1860798206344026416_: 1.6386 ms -> 11.2585 ms
It looks like tvm master disables avx. Now “llvm -mcpu=skylake-avx512” has similar performance of target “llvm”. This problem only happens for relay.