IR with let scope and without let scope

Hi all,

According to Relay’s paper, let scope is a fundamental design compared to data flow graph ir. However, recently I have seen this work,

  • An interesting point is the work is a graph based ir (sea of nodes), more compacted and less complication time consumed,
  • Besides, without let scope, which is cumbersome to maintain across certain transformations.

“The disadvantage of nesting becomes clear in the example in Figure 3. Assume we want to add a conditional branch in a to b and an additional function z, which in turn branches to c. The original nesting must be repaired since c needs to be visible from both b and z (see Figure 3b). Block/let floating has to be applied to float c into a. The reverse transformation, i.e. sinking c into b, is known as block sinking. Such situations occur, for example, during jump threading.”

The context of this work is to support HOF in LLVM, so I think we may have different consideration when design for AI applications.

My questions:

  1. How does sea of nodes ir look like compared to relay ir?
  2. Do we have the disadvantage of nesting scope in relay or AI application?

@MarisaKirisame @tqchen @jroesch @merrymercy

In terms of capability of optimization in inference, I don’t think there is any fundamental difference between sea-of-nodes IR and relay IR. If there is no closure/ref, in most cases, they can be used interchangeably. Therefore, theoretically, there is no disadvantage using Relay.

However, when it comes to complicated neural nets containing closure, it won’t be something that sea-of-node IR could naturally express - I mean, you may still do so but would be more error prone and not friendly to optimizer. And, for training, closure-based autodiff does generate closures.

1: Relay has a graph mode that sort of work with scope. Let’s not talk about that as ppl basically only use it for straight line code, and talking about it only complicate the issue. The biggest difference of son compared to Relay’s AST is that it force continuation passing style. If we use it we has to do optimization in continuation passing style which has it’s pros and cons. We currently had a CPS conversion pass, but it is known that once you convert it is hard to go back.

2: In Relay, scoping is handled by LetList::With, which use an RAII style interface to make sure stuff is in scope. Sometimes stuff goes out of scope (see the lambda lift bug) but we get on with it. I dont think it is a deal big enough to redo relay, but maybe we should try it if we restart from scratch.

Junru, thanks for reply.

I agree optimization capability of two is equivalent, there are differences in details. The difference is sea of nodes is embedding use-def chains into graph ir, so no need to do program analysis, and SoN can do Parse-Time Optimizations. see Cliff’s paper

Thorin(see first link) is a work based on SoN(second link) to support high order function, so do you mean SoN can’t naturally express closure, but Thorin can?

1 Like

Thanks for reply, @MarisaKirisame,

1, got it.

2, So it’s not a big issue for nesting scope, also IMO the transformations like jump threading mentioned in Thorin paper seem absent in deep learning, no goto ir obviously.

BTW, Myia is using ANF with sea of nodes, also without scope. But Myia’s paper doesn’t mention any details of ir design.

I assume myia has migrated to Relay?

I am not sure what it means by ANF without scoping…

Hmmmm I think other ways to do program analysis may include those on continuation passing style (CPS).

I think the most intriguing thing about Relay is its capability to do AD. Let’s consider the case that a program can contain mutual recursive function calls, and that we want to do higher-order AD.

Yes, just want to learn something, don’t care status of project.

  • thorin ir = CPS + sea of nodes + no let scope
  • myia ir = ANF + sea of nodes + no let scope

Myia ir is borrowing the design of thorin.

Are you talking about the option of program analysis in Relay?

program analysis is a semantical question, which is almost orthgonal to the syntax you choose.

the problem with relay is that the most complex code we tried is treelstm(not even training), so we have no idea whether we need more complex transformation summarizes some of the tradeoffs

Relay’s dataflow style IR corresponds to the no-scope version, which is easier for high-level transformations(because there is no need to decide the binding location of the variable). Note that there is still a “scope”, which is implicitly defined by the LCA of all the reference pt of a node.

When we are doing code-lowering though, it is helpful to have explicit binding location, that is why relay also supported scopes.

In general, deep learning system engineers are familiar with the data-flow style IR and the PL folks are more familar with the ANF form, we try to be practical to allow both in different phase of the optimization.

Hi junru, I am a friend of xqdan. Thanks for his invitation.

I think closure may be used in anywhere, as free variable is a fundamental expression. Such as, parameters could be a free variable in the nested sub net in the inference. AD is not the only capability of Relay. Every HOF IR follows Reverse-Mode AD in function framework ideas could do the same thing, like myia.

My question is that Could Relay borrow the idea from Thorin to avoid the annoying syntax-style scope in ir analysis and transformation? If not, why scope is so necessary for Relay to keep in syntax instead of implicit expressing in def-use data dependence like thorin and myia?

@tqchen Thanks for reply.

Relay has very clean and effective design and implementation, like different ir for different problem or developers you mentioned here.

At the very beginning, we thought dataflow graph and functional style ir like relay are only two different ir for high level ir. However, when we saw the paper of sea of nodes and Thorin, we realize that this is much different with dataflow graph, although it’s graph based. there is one post @GongChen recommended, it’s in Chinese, it’s worth for sharing here though.

We are also heavy users of TVM low level IR, we are suffering from missing program analysis like use-def chain. One possible solution is we add program analysis based on current ir, another is we build ir with these information embedding into ir itself, like sea of nodes. IMO, we may need to consider different choice for the evolution of tvm ir in the future.

Hi, tianqi. Thanks for reply. I got your idea of tradeoffs. This maybe a good engineering tradeoff. But, two-style IR may also introduce problems. (1)It may make compiling pipeline complex. (2) The dataflow-ANF-CPS conversion is costly and actually redundant for lowering. (3) It may introduce annoying bugs in conversion, when new features related to IR are developing.

Thorin and myia give a simple functional IR in graph-style. DL and PL engineers could understand the IR in their familiar way. Keeping IR in same semantic in the same level maybe more practical for a compiler system?

Agree program analysis is orthogonal with syntax of ir. However, how to express program with ir affects how you get the information you want. like the one purpose of SoN is to save some program analysis, since it’s already embedding in ir stream.