tvm's interaction with pytorch_geometric

liaopeiyuan · May 16, 2020, 3:13am

I’m trying to compile a graph neural network model written with PyTorch and an extension called torch_geometric, but it seems that tvm has limited support for external libraries it uses such as torch-scatter, torch-sparse, torch-cluster and torch-spline-conv. I’m very new to tvm so I’m not 100% sure if I’m using it correctly, but here’s the code to trigger the exception:

import tvm
from tvm import relay
import numpy as np
from tvm.contrib.download import download_testdata
# PyTorch imports
import torch
import torch.nn.functional as F
from torch_geometric.datasets import Planetoid
import torch_geometric.transforms as T
from torch_geometric.nn import GATConv

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()

        self.conv1 = GATConv(1433, 16, cached=False,
                             normalize=True)
        self.conv2 = GATConv(16, 7, cached=False,
                             normalize=True)

    def forward(self, x, edge_index):
        c1 = self.conv1(x, edge_index)
        rc1 = F.relu(c1)
        
        d1 = F.dropout(rc1, training=self.training)
        c2 = self.conv2(d1, edge_index)
        r = F.log_softmax(c2, dim=1)
        return r

dataset = 'Cora'
path = osp.join('..', 'data', dataset)
dataset = Planetoid(path, dataset, transform=T.NormalizeFeatures())
data = dataset[0]
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Net().to(device)

inp = (data.x.cuda(), data.edge_index.cuda())
scripted_model = torch.jit.trace(model, inp).eval()

input_name = 'input0'
shape_list = [(input_name, data.x.size())]
mod, params = relay.frontend.from_pytorch(scripted_model,
                                          shape_list)

It seems that a range of ops are not currently supported, including:

 ['aten::_set_item', 'prim::ImplicitTensorToNum', 'aten::__range_length', 'aten::numel', 'aten::__is__', 'prim::unchecked_cast', 'aten::index', 'aten::dim', 'prim::dtype', 'torch_scatter::scatter_max', 'aten::scatter_add_', 'aten::__isnot__', 'aten::index_select']

#5133 already addresses prim::ImplicitTensorToNum I believe, and torch_scatter::scatter_max is specific to torch-scatter’s implementation.

I’m wondering if concerns like this align with the current direction for development. Thanks!

masahi · May 16, 2020, 2:45am

Converting graph NN sounds interesting, and in principle it should be possible.

But for your particular case, aten::_set_item and aten::index_put_ seem like inplace update op (e.g. a[i] = ... in python). Our PyTorch frontend doesn’t support such side-effecting ops. Maybe it is possible using Relay’s reference, but no one have tried so far.

liaopeiyuan · May 16, 2020, 3:11am

I tried to specifically avoid side-effecting in my implementation, so I believe they are caused by torch_geometric’s implementation of message passing layers. I also checked that scatter_add_ and index_select are indeed Pytorch native functions, so maybe they are not yet implemented?

In addition, when using GNN architectures like Graph Attention Networks, torch_geometric will use external ops like torch_scatter::scatter_max. So generally, is it advisable to add support for non-native ops?

masahi · May 16, 2020, 3:23am

Since the number of torch extension is unbounded and application specific, we don’t support any of them out of the box. But it is possible to define a custom converter yourself and use it from the frontend.

See this example of how to convert torchvision’s custom op.

github.com

apache/incubator-tvm/blob/master/tests/python/frontend/pytorch/test_forward.py#L1089




def test_vgg11():
    torch.set_grad_enabled(False)
    verify_model("vgg11")


def test_vgg11_bn():
    torch.set_grad_enabled(False)
    verify_model("vgg11_bn")
"""


def test_custom_conversion_map():
    def get_roi_align():
        pool_size = 5
        n_channels = 2 * (pool_size ** 2)
        x = torch.rand(2, n_channels, 10, 10)
        rois = torch.tensor([[0, 0, 0, 9, 9],  # format is (xyxy)
                             [0, 0, 5, 4, 9],
                             [0, 5, 5, 9, 9],
                             [1, 0, 0, 9, 9]], dtype=torch.float)
        roi_align = torchvision.ops.RoIAlign(pool_size, spatial_scale=1,
                                             sampling_ratio=-1)

hwhayn · July 20, 2021, 7:47pm

I have encountered the same problem using GCNConv from PyTorch Geometric while compiling my model in SageMaker Neo which uses TVM. It cannot convert the side-effecting ops aten::scatter_add_, aten::masked_fill_, aten::pow_. Have you been able to solve this error yet using the custom converter or am I going to have to create a custom MessagePassing class without those ops or switch to GraphConv in DGL.