Hello! Suppose my CPU supports AVX2 which supports operations with 256-bit registers (8 FP32 operands). Does that mean in AutoTVM we can always config like
# (suppose the length of x is 32)
xo, xi = s[A].split(x, factors=8)
s[A].unroll(xo)
s[A].vectorize(xi)
so that we can avoid searching the split of the x axis? Does a direct vectorization on x, like
s[A].vectorize(x)
generate different assembly codes and have a performance different from the above example?
Thanks in advance!