[RFC] Visualizing Relay program as graph

Currently, the only way to inspect the Relay program is through the text format:

v0.0.4
fn (%x: Tensor[(21), float32], %y: Tensor[(958, 21), float32]) {
  %0 = subtract(%x, %y);
  %1 = multiply(%1, %1);
  sum(%1, axis=[1])
}

It would be nice if we can visualize the Relay program as a graph like this: gv

I wrote a rudimentary script to generate GraphViz diagram from a Relay function, but it would be useful if visualization functionality became part of the TVM codebase.

My proposal is to add a function tvm.relay.analysis.visualize() under the tvm.relay.analysis namespace. The function should accept a Relay Function object as the input and produce one of the following:

  • GraphViz Dot program (Dot is a language used in GraphViz)
  • JSON dump, to be ingested by other packages such as Netron. This way, Netron won’t require TVM itself to parse a Relay program.

If this sounds reasonable to the community, I will prepare a pull request.

@haichen @junrushao1994 @yzhliu

5 Likes

Hey Phil, just saw your nice visualization last week. Love it :+1:

BTW, do you have any plan for supporting if-then-else?

@junrushao1994 Yes, the PR will support if-then-else. The if-then-else node will need to take three child nodes: the condition, the true branch, and the false branch.

1 Like

+1. I take that back. +1000. Inspectability of the compilation process, including visualization of the IRs, is super important and this is a great step in that direction.

Graphviz is great because it is widely available, many people know it and it produces an image file that it is trivial to open on any machine from a link, though also Graphviz is not so great at very large graphs. One way to handle that is to have an option to display a radius around a node (as in slides below) or to have the option to use a different library as well in case of large graphs, e.g. TensorBoard. I don’t know if Netron is good for large graphs, maybe it is? No need for us to address any of that initially, having this is a great advance, though it would be good if the API is written in a way so that it is natural to add more things later.

(There are some style details that can be worked out later/over time, e.g. it would be nice if every op would say what shape/type it produces, rectangular nodes are more economical of space, multiple operands implies a need to distinguish incoming edges and color coding of ops can be useful for larger examples. Slide 13-15 might offer some inspiration: https://drive.google.com/file/d/1AfNoznbCIejQLErblXF7CV_Wk4cJypuC/view Though this is great as-is!)

4 Likes

Thanks Philip. This will increase productivity and debugging capabilities multi-fold.

Moving down the line, I think we might want to think how to represent multi-subgraphs (will happen in a fused graph and graph partitioning) and control flow constructs etc. For now, it might be ok to keep things simple.

@broune I appreciate your comments. The slides is certainly a good inspiration. I’ll keep fiddling with GraphViz to get better looking graphs.

to have the option to use a different library as well in case of large graphs, e.g. TensorBoard

Yes, I was also worried about using large graphs in GraphViz, so I put the JSON dump option. @haichen suggested Netron, and it is a JavaScript library. I’ll see how it handles a ResNet model.

TensorBoard option will definitely be useful, since TensorBoard lets you collapse and expand parts of the graph interactively. The disadvantage is that TensorBoard uses Protobuf for serializing deep learning graphs, and Protobuf is rather a heavy dependency in my experience.

We can certainly engineer the API so that dependencies like graphviz or protobuf is lazily imported upon calling tvm.relay.analysis.visualize()

Yes, it would be nice to be able to visualize the effect of each individual Relay pass, such as operator fusion. I have an occasion where I have to write a custom Relay pass, and this capability will ensure that I didn’t mess up.

Great proposal, I think it is important to separate the concerns of data representation and rendering:

  • Defining a structured specification of the visualization spec(e.g. JSON) that can be translated and ingested to other visualizers(Netron, GraphViz)
  • Document that structure specification
  • Work on exporter
  • Work on the visualization (or perhaps not depending on how powerful the viz grammar is)

It would also be great to get some feedbacks from data-viz community about what specific things could be in the spec.

This idea of separation of data and rendering follows the same philosophy in visualization packages like https://vega.github.io/vega-lite/ (except these are for tabular style data)

1 Like

@tqchen I like the idea of separating data representation and rendering. Let me draft a spec for data representation soon.

1 Like

Per @tqchen’s suggestion, we should have Relay produce a JSON-like data representation that can be ingested by multiple visualizers (GraphViz, Netron, TensorBoard). I’ll refer to this data representation RelayViz. In this document, we write RelayViz objects as JSON objects.

Specification for RelayViz data representation

Root object layout

{
  "format": "relayviz",
  "version": [1, 0],
  "nodes": [ <list of nodes> ]
}

General principles

  • Many nodes in Relay (Call, TupleGetItem, Function etc) have references to other nodes. Such references should be encoded as an integer ID. The integer ID locates the referenced node in the “nodes” field of the root object. For example, a Function node has a body attribute that refers to another Relay node. So the Function node should store the node ID of the function body.
  • All attributes of Relay nodes that are not node references should be encoded as an attribute.

Example: Function node (one node reference, one attribute)

{
  "node_kind": "Function",
  "body": <node ID of function body>,
  "ret_type": {
    "dtype": "float32",
    "shape": [64]
  }
}

Example: Call node (five node references, two attributes)

{
  "node_kind": "Call",
  "op": "nn.batch_norm",
  "args": [
    <node ID of first argument>,
    <node ID of second argument>,
    <node ID of third argument>,
    <node ID of fourth argument>,
    <node ID of fifth argument>,
  ]
}

Spec for particular nodes

This spec should cover all Relay nodes in https://docs.tvm.ai/api/python/relay/expr.html. All appearances of bracketed phrases (<...>) should be interpreted as placeholders.

Function node

{
  "node_kind": "Function",
  "body": <function body (node ID, integer)>,
  "params": [
    <Input variable (node ID, integer)>,
    ...
  ],
  "ret_type": {
    "dtype": <type of tensor element (string)>,
    "shape": <shape (dimension) of tensor (list of integers)>
  }
}

Note: each element in the “params” field must refer to a Variable (Var) node.

Var (variable) node

{
  "node_kind": "Var",
  "name": <name of variable (string)>,
  "dtype": <type of tensor element (string)>,
  "shape": <shape (dimension) of tensor (list of integers)>
}

Note: we set the “shape” field to an empty list ([]) for scalars.

Call node

{
  "node_kind": "Call",
  "op": <name of operator to call (node ID, integer)>,
  "args": [
    <call argument (node ID, integer)>,
    ...
  ]
}

The value for the “op” field shall correspond to an Operator node.

For now, we skip attrs argument in tvm.relay.expr.Call.

Operator (op) node

{
  "node_kind": "Op",
  "name": <name of operator (string)>,
  "attrs": <attributes for the operator (dict)>
}

The “attrs” should store the attributes specific to the operator. For example, the convolution operator (conv2d) should store padding and strides as attributes. For the sake of simplicity, I will skip the “attrs” field for my upcoming pull request.

Const (constant) node

{
  "node_kind": "Const",
  "value": <value of the constant (int, float, or dict)>,
  "dtype": <type of tensor element (string)>
}

Note: if the constant is a tensor, then the “value” field will be set to the object

{
  "array_value": <array value (list of floats)>,
  "array_shape": <shape of the tensor (list of integers)>
}

Bind node

{
  "node_kind": "Bind",
  "expr": <the input expression (node ID, integer)>,
  "binds": {
    <variable_name> : <expression to bind (node ID, integer)>,
    ...
  }
}

Note: In tvm.relay.expr.bind, the binds argument may be one of the two types:

  • Map[tvm.relay.Var, tvm.relay.Expr]
  • Map[str, tvm.relay.Expr]

If binds is of type Map[tvm.relay.Var, tvm.relay.Expr], then set the <variable_name> placeholder to a value of form “node_XXX”, where XXX is the node ID of the Variable node.

Tuple node

{
  "node_kind": "Tuple",
  "fields": [
    <tuple element (node ID, integer)>,
    ...
  ]
}

Let node

{
  "node_kind": "Let",
  "variable": <local variable to be bound (node ID, integer)>,
  "value": <value to be bound (node ID, integer)>,
  "body": <body of the let binding (node ID, integer)>
}

If node

{
  "node_kind": "If",
  "cond": <condition to test (node ID, integer)>,
  "true_branch": <expression evaluated when condition is true (node ID, integer)>,
  "false_branch": <expression evaluated when condition is false (node ID, integer)>,
}

TupleGetItem node

{
  "node_kind": "TupleGetItem",
  "tuple_value": <input tuple (node ID, integer)>,
  "index": <index of element to be extracted (integer)>
}

The spec I proposed captures all information that can be obtained from the Python interface to Relay. We can discuss further additions to the spec in a later work. (For example, I left out all operator-specific attributes.) The spec adopts a JSON-like format so that it’s easy to add new attributes.

Prototype implementation is now put up as a draft pull request: https://github.com/apache/incubator-tvm/pull/4370

3 Likes

Hi guys: I noticed the pull request failed. Do you still plan to add this function? I thin it is very useful.

Best