[DISCUSS] Embed more Bound Information into Var or Expr

tqchen · September 21, 2019, 12:50am

In our optimizations we are heavily dependent on the bound information of the expressions. In particular, the sign information is very useful in deciding simplifications and final lowering.

Currently, we usually have to get the bound information of variables from its context, or simply lost the the bound information during lowering

Example Problem

Consider the following code, which flattens an array.

n = tvm.var("n")
A = tvm.placeholder((n, n), name="A")
B = compute((n * n), lambda i: A[floormod(i, n)][floordiv(i, n)])

for (i, 0, n * n) {
    B[i] = A[floormod(i, n)][floordiv(i, n)]
}

The key question here is how can be lower floormod(i, n) and floordiv(i, n) operations. Due to different division semantics. The most effective way is to directly lower floormod(i, n) => truncmod(i, n) which directly corresponds to i % n operation in C. However, in order to do this, we will need to know that i >= 0 and n >= 0.

The current analyzer can try to get the context information of i from its loop body. However, it also depends on how a developer sets it up. If we forget to populate the context information of i and directly run the logic below, then the analyzer do not know that i is actually non-negative.

FreshAnalzyer().RunLowerFloor(floordiv(i, n))

If we instead carefully populates the context information via recursive visiting, then we can take benefit of i's bound information.

However, in the above example, the even worse case is that the lowered code no longer hints that n is non-negative, which it should be, because it represent shape of a tensor. Without this information, we can no longer efficient generate lowered code for floordiv(i, n).

Note that this is only one example that illustrates the importance of the additional bound information. This example uses the floordiv/mod, so the challenge arises in the final lowering step. If we use other division mode, such as truncdiv which directly corresponds to the low level instruction, the problem will occur during simplication, where the simplifier do not know about the corresponding bounds.

Summary of Issues

To summarize, there are two issues we want to address:

Generate useful bound information for symbolic variables, so we can do effective lowering
Sometimes a developer forget to populate context information, making the analysis less effective(due to lack of these information)

Possible Solutions

The good news is we do have ways to solve the above problems, this RFC is mainly to discuss which way(or ways) we would like to adopt to make our life easier.

Proposal 1: Generate AssertStmt When Binding Shapes

First of all, because we know n is a shape, we already know that it is not negative. We can take benefit of this when generate binding information for shapes in ArgBinder

However, this information was not propagated to the codegen phase. We can generate assertion, via AssertStmt, when binding those variables. So the eventual low level code becomes the following code.

assert n >= 0
for (i, 0, n * n) {
    B[i] = A[floormod(i, n)][floordiv(i, n)]
}

Now the context-dependent analysis can catch the bound information of n and do successful lowering.

Pros of this approach

This is perhaps the most light-weight approach.

Cons of this approach

If we do analysis before the arg binding phase, the non-negative information of n is not available.
If we want to do context-independent analysis, this information is not available.

Proposal 2: Embed Bound Information into Expression

Both this and the next proposal tries to embed bound information directly into the expression, besides the context. For example, we can introduce an NonNegExpr expression, which returns the value and asserts that the value is non-negative.

class NonNegExpr : public Expr {
   public:
     Expr value;
};

When we create tvm.placeholder(shape=(n, n)), we actually store them as
(nonneg(n), nonneg(n)). That is, we actively add non-negative hint wrapper to the stored expression. Because the expression have the non-negative hint, we can do the analysis easily.

Of course, depends on what we want, we could introduce more complicated expressions, such as AssertExpr(cond, value), which returns value while asserting condition is true. Or BoundConstraintExpr(value, const_int_lower_bound, const_int_upper_bound).

Main advantage of embedding information into expression.

Because the bound information is embedded in the expression, we can do analysis without even populate the context information. This is a huge advantage over the previous case
The bound assertions are attached when we pass things into shape constructors, user do not have to think too much about it.

Main drawback of this approach

Because the bound assertion is an nested expression, it is hard for certain simplification cases.

For example, the simplification engine cannot deal with the following case.

nonneg(x * y) / nonneg(x)

Proposal 3: Embed Bound Information into Variable

A more drastic change to the IR structure, is to embed the constraint information into the expression. This can be done by either introduce bound or non-negative flag to the Variable, or make a subclass of Variable

class ShapeVar : public Var {
   public:
     int64_t const_lower_bound = -kPosInf;
     int64_t const_upper_bound = +kPosInf;
};

For loop variables, we can unify IterVar as a sub-class of Var, and the range field will be able to provide these information (this seems to might be a good change regardless, except that it will result in code refactor)

Pros of this approach:

Same as proposal 2, we can do context-independent analysis
We won’t have the problem of simplifying nested annotation cases as in proposal 2.

Cons of this approach:

A developer need to be mindful about declaring the bound of the variable. For example, we now might need to write:

n = tvm.var("n", nonneg=True)

What if a developer forget to do so, and still uses the normal variable for shape, in theory we allow this case as well, as the additional constraints just helps to generate better, code, should we do a rewriting phase to embed information back, or should we shoot an warning?
We do need to create a new Variable(with new bound information), as we rewrite to a smaller region of the loop.

Proposal 4: Same as Proposal 2, But only Wraps Variables

This avoids cases like nonneg(x * y), because we won’t do auto wrapping for these cases.

Things to Discussions

As we can see, embedding information into the Vars or Exprs bring the benefit of additional analysis without dependent on the context. That does mean, however, we need to choose to recreate the Var with every time we might want to update the hint.

These information can complement the context-dependent analysis. For example, we could declare a variable with bound [0, 100], but then get additional constraints based on the loop or assertions.

Things to discuss:

Which proposal do you like
Shall we make IterVar as subclass of Var
For the additional constraint information, regardless of the nested expression, or embed into Variable, what should that constraint field be like.
- (a) It can be a simple bool nonneg flag, as this is the most important usecase.
- (b) It can be a int64 const_lower_bound and upper bound pair. The upper bound might be useful for memory analysis in dynamic shapes.
- © It should be AssertExpr(any_condition, value), because this is the most general version.
- Of course this is a trade-off. We could always introduce combinations, like (b) primarily but also support ©
Should we build nested expression or embed it into the Variable(perhaps as sub-class)
- In the case of nested expression, should we allow nesting on non-variables. which is less useful and results in problems like ```nonneg(x*y)/nonneg(x)````

During discussion, please keep in mind that we are making tradeoffs here, regardless of which choice, I feel pushing it would make the analysis and codegen towards a better direction.

Please share your thoughts

tqchen · September 21, 2019, 12:55am

everyone is welcomed to discuss

kevinthesun · September 22, 2019, 4:52am

Thanks for bringing this up. This will help resolving the similar issue in Expr Simplifier for tvm.var. I personally prefer options 2 - 4, since they decouple simplifier for symbolic variable from other phases.

For the additional constraint information, option c might be useful in some cases of dynamic shape codegen when we have a non-uniform bucketing space, though this is probably not the most usual case.

comaniac · September 22, 2019, 7:59pm

I personally prefer option 4 actually with a more aggressive refinement: Since we know shapes are always non-negative, we can create a sub-class ShapeVar that can only be referenced as a tensor shape. Developers will write the code like

n = tvm.shapeVar("n")
A = tvm.placeholder(shape=(n, n), name="A")

Pros are like Tianqi has illustrated, and a drawback is that we need to check all variables used to declare tensor shapes are ShapeVar and shoot errors if not, which may hurt user experience somehow.

A less aggressive one is including option 3:

A developer can still declare a variable like n = tvm.var("n") without mentioning it is non-negative or not. At this point, we initialize lower_bound and upper_bound to undefined.
When we reach a reference of n, we check its bonds and rewrite it if valid. For example, when we see tvm.placeholder(shape=(n, n)) and bound are undefined, we rewrite n's bounds to (0, +inf); when we see something like tvm.compute((p, q), lambda i: D[i] + n), we rewrite them to (-inf, +inf).
If we found that the bounds have been defined but inconsistent to what we are going to rewrite, we shoot a warning or error.

For additional constraint information, it’s no doubt that C is the most general solution with the heaviest efforts. For the short term, however, I prefer option B because I feel the upper bound information could also be useful and it takes similar efforts as option A.

haichen · September 23, 2019, 3:38am

I prefer proposal 3. I think the proposal 3 is more extensible where we could embed more information about the shape such as the upper bound and if the shape var is multiple of some constant. They’ll be helpful to memory analysis as well as codegen/AutoTVM. Note that Relay now already has the concept of shape var when defining the type, which it currently is just a wrapper of a TVM var.

For the cons of this approach, I think we might automatically convert the regular var to a shape var when they’re used for shape.

tqchen · September 23, 2019, 3:53pm

Thanks @comaniac @haichen @kevinthesun for helpful discussions. Some more updated thoughts on Proposal 3 vs 4, specifically about creating a sub-class of Var

Because we relies on the visitor pattern to do IR transformation. Creating a sub-class of Var will mean that we need to have different visitor to each of them(at out current setting), unless we have a separate sub-classing pattern, this additional complexity might make sub-classing less appealing to the other two alternatives:

Put lower upper bound field on the Variable itself, without introducing additional ones
- Cons: less comprehensive(cannot do things like multiple constraints as @haichen mentioned)
Compositional approach, introduce an intrinsic, assert_lower_bound(value, bound), but only place it if value is an Var(can do the check during wrapping)

The compositional approach can introduce more intrinsics latter and don’t have to change user’s API. User can still cal tvm.placeholder(shape=(n, n)), and we do the following inside the placeholder function:

def placeholder(shape):
    shape = [tvm.intrin.assert_lower_bound(svalue, 0) 
                   if isinstance(svalue, tvm.expr.Var) else svalue
                   for i, svalue in enumerate(shape)]
    # follow up code

kparzysz · September 23, 2019, 5:39pm

Embedding this kind of information into an expression is impractical. Say, you have an expression x+y, and it has non-negative embedded into it. If, due to some code transformations, we end up with another instance of x+y, it may not have that information in it. This is wrong-in-principle.

Attaching it to a Var is better, but it needs to be propagated to new vars (for example when splitting a loop). This should not use subclasses, because

we should be able to add information to an existing variable,
things like ShapeVar are just too specific about the kind of information they contain.

Personally I think that either assertions, or variable “attributes” are the way to go. We shouldn’t (in the design) be placing any limits as to what kind of information can be represented.

yzhliu · September 23, 2019, 8:24pm

+1 for proposal 3, embedding to Var reduces complexity.
I feel having nonneg flag only is too restricted though it has the most common use case.

tqchen · September 24, 2019, 5:40pm

Summary of current discussion, I think we are converging to two possible candidate solutions:

First of all, many folks seems to agree that nonneg is a bit too restricted, and it would be helpful to have const_upper_bound and const_lower_bound information.

In terms of ways to attach the information, we have two possible candidates

Option A: Embed into Variable

We also talked about sub-classing, but that may not be good due to the reason of additional visitor complexity

class Variable : public Expr {
  public:
     std::string name_hint;
     int64_t const_lower_bound = -kNegInf;
     int64_t const_upper_bound = +kPosInf;
};

Pros and cons

Easy to simplify
Force user to think about bounds
Not composable (cannot attach to an existing variable)

Option B: Use Assertion Style Intrinsics but only allow wraps Vars

n = tvm.var("n")
n_wrapped = tvm.intrin.assert_lower_bound(n, 0)

Pros and cons:

Composable and can attach information without user to worry about it (see the following example)

def placeholder(shape):
    shape = [tvm.intrin.assert_lower_bound(svalue, 0) 
                   if isinstance(svalue, tvm.expr.Var) else svalue
                   for i, svalue in enumerate(shape)]

@kparzysz bought up a good point about we will lost such hints if we wrap a composite expression(aka assert_lower_bound(x + y, 0) could be potentially bad because it harms certain simplification, and we might lose the information. On the other hand, if we force to only wrap over variables it may not become a problem, because the wrapped variable will usually be translated as a whole and we don’t have to remove the hint wrapper until very end of the codegen.

tqchen · September 24, 2019, 5:40pm

everyone it would be great if you can do followup discussion wrt to the new set of refined proposals and feel free to propose alternatives

haichen · September 25, 2019, 6:54pm

I’m in favor of Option A.

The cons of option B is that after you wrap multiple asserts on a variable, it increases the burden of compiler pass and to some extent the pass needs to keep these properties of vars on its own.

On the other hand, you could do similar thing for Option A. When a var is passed to an op as a shape var, the op can automatically update the lower bound to be at least 0.

tqchen · September 25, 2019, 7:54pm

Some clarifications

Updating the bound info on a variable would break the immutable assumption of the Var (because we are creating and then mutating it later), which is not desirable. If we go for the embedding into var route, we would likely need to ask users to explicitly declare non-negative vars.

In terms of the problem of wrapping multiple asserts, the current proposal is only wrap if it is a variable, so there will be at most one assert wrapped around the vars. The compiler does not have to track the properties of vars because the expression of interest will always be assert_lower_bound(var, 0) instead of var itself, and the bound information will be kept as long as do not remove the intrinsic wrapper.

kparzysz · September 26, 2019, 12:56pm

I prefer option B. It does imply extra complexity in compiler passes, but it’s more general. One advantage of it is that more assertions can be added to an existing variable (for example, by some compiler analysis).

jwfromm · September 30, 2019, 1:06am

I definitely prefer option B. Although I understand Haichen’s point that A is more flexible, I think that adding complexity for users should be avoided. A user might expect shape variables to naturally be non-negative and requiring them to specify it will trip up many new users.

tqchen · October 1, 2019, 4:43pm

Thanks for everyone’s discussion. @haichen@yzhliu do you think if you can go with option B? At the moment it seems that technically option B is easier for us to get in right now and has better composibility and future compatibility as @kparzysz

If the wrapper can be applied to any expression, then there will be additional complexities in simplifier etc.

If we restrict the wrapper to var, likely we won’t add additional complexity to passes as well, because the expression will always be transformed as a whole

broune · October 2, 2019, 6:51pm

Hi, I’m new here, though I’ve been involved with similar issues in other compilers. So please allow me to ask some potentially stupid questions:

Doesn’t B also complicate simplification? A rule like X - X -> 0 should presumably also apply to e.g. X - assert_lower_bound(X, 0). Or is the idea that X could never appear both with and without its assertion? These kind of wrapper no-op-ish ops generally lead to trouble.

Aren’t we going to be sad to have this encoded as lower_bound and upper_bound fields on an op? Would be nice to have an object that represents a set of constraints and allows queries like “is X >= Y?” and the implementation can right now be just lower_bound and upper_bound but is easy to modify later, also without changing the API, or at least not much.

If only vars can be tagged and I want to do a replacement of X -> X’ + 1 as part of some optimization, then I can’t just replace uses of X with the new value X’ + 1, because that might create an instance like assert_foo(X’ + 1) which would be malformed. I’d have to figure out how to push the + 1 (or whatever it is) through the assertion and update it correspondingly. Is this a cause for concern?

If we wrap assertions on ops/vars, then where do I put a constraint like X >= Y for two vars, does that wrap the X op or the Y op or both or neither? Or are we not looking to support that kind of constraint (for now)?

A question I have here is what the semantics are if the assertion is not satisfied. Is that undefined behavior and anything could happen? Might an error be issued? If so, is that error guaranteed to be issued? If so, is there an ordering guarantee between two failing asserts of which one is checked first? “assert” suggests a guarantee of an error being issued. Is there a sanitizer mode where the errors are guaranteed, if not otherwise? (in my experience this is a difficult part of this kind of thing)

I understand the need to tag these constraints to parameters to functions, where inside the function you don’t know what values will come in. The examples here are things like a var bound to a loop with a defined range. Couldn’t any optimization interested in that var look up its definition and analyze from that what the range is? (I’m sure I’m missing something here)

Bjarke

tqchen · October 2, 2019, 7:36pm

Thanks for sharing insights. These are great points.

First of all, we do have alternative ways to add additional constraints like X>= Y in the context via AssertStmt, the only drawbacks is that we do need to populate these constraints of vat in the context before we perform simplification. The proposals only sought to complement the existing approaches.

Also the main usecase is to tag the non-negative property of a input shape variable(rows of a matrix). In the above example, the nonnegative info about variable n. and such information is not yet available from loop ranges(although we can add asserts to provide context dependent information-proposal 1 in the first post)

In terms of the mixing asserts with vars without asserts. Most vars we care about are shape variables of input tensors, if we wrap them
during input argument creation time, all the related expression using the var are going to be wrapped by the asserts, there won’t be a mixture of vars that are wrapped and vars that are not(although such form is a valid IR).

If we end up replacing the var by an expression (the assert(X+1) case), the assertion is still valid, but we just don’t construct the wrapping in the first place if it is not a var, so it is mainly a convention to avoid over complicated nested assertions. So the proposal is to allow asserts about expressions as valid IR, but adopts the convention to not construct these wrapping at the first place

Finally, in terms of the semantics, we could explicitly translate these assertions to a runtime assertion, or ignore them and treat it as undefined

broune · October 2, 2019, 9:08pm

Thanks for the background, I have a better idea about it now.

*** Option A
Setting the range on the var flows the information in effect backwards in time, since it is set for the lifetime of the var, also before calling placeholder(). Is there any way that this could go wrong? E.g. maybe someone checks to see if the n is negative, before creating the shape, and halts the program if it is (or if-then-else it out or some such), and then that gets optimized out when it shouldn’t because n is known to be non-negative at a later point in time and that is flowed backwards in time?

*** Option B
I understand the premise to be that by letting placeholder() wrap its array sizes internally, the user will be automatically using the wrapped n. If the user accesses n only through the shape, then I follow how that works. At the top here, in the first post, I see “floormod(i, n)” seemingly using the original n, not a wrapped one. Is that a cause for concern?

If you expect that substitutions will violate the invariant that only vars are wrapped, I think the statement would be that 1) substitutions of that kind shouldn’t be done, or 2) such substitutions should always remove or fix up any wrappers, or 3) the compiler should handle non-vars that are wrapped just fine (which was what we’re trying to avoid) or 4) such cases just won’t be optimized as well. Are we aiming at one of these possibilities with option B, or another possibility that I missed?

I don’t know TVM well enough to have a real opinion here, but I have seen before that no-op-ish wrappers tend to lead to trouble in optimizing compilers.

For what happens when the condition is violated, I’d just point out that it’s a huge difference what is guaranteed. Guaranteeing an in-order error makes everything side-effecting and can be a big burden that prevents optimizations (e.g. dead code elimination - if the error is guaranteed, then you still have to check error cases for ops that you don’t need the value of otherwise) - I’m told this is a big burden for Java compilers. On the other hand it’s confusing if an error is not reported, or only sometimes reported, though that’s how llvm.assume works anyway (https://llvm.org/docs/LangRef.html#llvm-assume-intrinsic). In XLA you might note that most ops are defined even for non-sense input, just so that there is never an error condition, thus getting around this whole question, though that probably isn’t an approach that they can keep up indefinitely. I’d like to suggest some sort of hybrid solution here, maybe along with a sanitizer, but I don’t know TVM well enough to really have an opinion.

tqchen · October 2, 2019, 9:35pm

Good summary of the discussion

Option A: The main problem of allowing mutating an IR nodes is that sometimes we might have context dependent info like if_then_else. It is also an explicit design choice we made in general(make nodes immutable for other gains – shallow copy, thread-safety).

Option B: My current thought is that compiler will handle non-var wrapping cases, but 4) such cases just won’t be optimized as well. Note that we do have the alternative to create a context constraint instead of wrapping them in an expression(which I think should be used for nested expression), i.e.

AssertStmt(n >= 0, 
   for (i) {
     result = floordiv(i, n)
  })

The context assertion enjoys the benefit of not having to created a nested assertion op(makes the expression easier for simplification).
The main challenge of this alternative approach is that sometimes developers are not careful enough to populate the context, or if these expression are in a fragment of DSL(in which case we only have result = floordiv(i, n)) rather than the entire IR. Only constructing such wrapping expression on vars seems to be a good trade-off.

In terms of in-order error, I don’t think we want to guarantee that. As a matter of fact, in most cases the shape of tensors should be positive and the only cases the error occur is when the programmer did something wrong, and the order of error does not matter.

kparzysz · October 3, 2019, 1:47pm

Whatever approach we take, it will complicate optimizations. The focus here is to add the functionality to express conditions in such a way that this complication is added once, and is not compounded with each additional type of condition.

Points to consider:

The conditions should not be needed for correctness, they should only be there to aid optimizations. In particular, if the conditions are dropped we should still generate correct code (although possibly less efficient).
Each optimization that generates new variables should be responsible for correctly maintaining the conditions. In particular, code that replaces variables should handle their enclosing conditions.
Conditions should be context-sensitive, i.e. an instance of a condition should apply to the particular occurrence of a variable, not to all occurrences of it.
There should be a single builtin that can contain all possible conditions, e.g. builtin_condition(x, non_negative, multiple_of(8), ...). This is to avoid expressions like condition1(condition2(x)).