Hello! Suppose my CPU supports AVX2 which supports operations with 256-bit registers (8 FP32 operands). Does that mean in AutoTVM we can always config like
# (suppose the length of x is 32) xo, xi = s[A].split(x, factors=8) s[A].unroll(xo) s[A].vectorize(xi)
so that we can avoid searching the split of the x axis? Does a direct vectorization on x, like
generate different assembly codes and have a performance different from the above example?
Thanks in advance!