What is "Cannot prove condition when generating the post doubt loop"?

ibeltagy · October 2, 2019, 7:41pm

I am getting this warning with tvm.build even though codegen generated the correct code.

Equation:

lambda i, j: tvm.sum(
    tvm.if_then_else(
          tvm.all(
             i + d * (k - w) >= 0,
             i + d * (k - w) < n,
         ),
         X[i, k] * Y[i + d * (k - w), j],
         tvm.const(0, dtype)
    ), axis=k)

Warning:

[19:33:42] /usr/tvm/src/pass/loop_partition.cc:545: Cannot prove: ((((((n + 31)/32) - 1) - (((n - 32)/32) + 1)) + 1) &gt;= 0), when generating the post doubt loop

tqchen · October 2, 2019, 8:56pm

It is just an warning saying that some of the loop partitioning might not be perfect, it won’t affect the correctness of the generated code

forrestm-quic · October 3, 2019, 3:06am

We see this in the hexagon backend a lot when working with runtime dimensions for tensors. Correctness isn’t affected, but codegen is affected. We’ve solved some of the issues with additional rules in canonical simplifier or rewrite simplifier and see performance improvements ranging from negligible to 10x.

If you’re feeling brave, you can probably fix this warning by adapting the rewrite rules from halide here into rewrite_simplify.cc.

snowolfhawk · October 15, 2019, 7:48am

Another Question, your example using if_then_else in tvm.sum, it has poor performance in my practice. why doing that?

ibeltagy · October 15, 2019, 3:56pm

because i + d * (k - w) can be larger than n, the first dimension of Y. Without if_then_else, it will access out of bound memory

ibeltagy · October 15, 2019, 4:34pm

@snowolfhawk, how significant is the slow down? I didn’t notice much of a difference in my case

snowolfhawk · October 16, 2019, 1:19am

I use target “llvm -mcpu=core-avx2” on x86 CPU(skylake) with large X/Y shape(1080,1920).

I think the generated code like below is hard to be optimized for llvm:
for range(axis i):
for range(axis j):
for range(reduction axis k):
C[i,j] = if_then_else(contidiotn, X[i, k] * Y[i + d * (k - w), j], 0)

“if_then_else” within reduction axis K have negative effect for llvm to generate vectorize optimization.
Maybe there don’t have performance down for cuda or small shape.

ibeltagy · October 16, 2019, 3:05am

I ran it on GPU. That explains it.