[Relay] Homogenous Compilation Example

jwfromm · May 14, 2019, 1:36am

I’m interested in compiling certain layers of a graph for CPU execution and other for GPU. From the Relay API, it seems like this is possible, however, I can’t find any examples showing the proper way to do it. Anyone know where I might find one?

zhiics · May 14, 2019, 5:32am

Are you talking about heterogeneous compilation examples? You can find them here:

github.com

dmlc/tvm/blob/master/tests/python/relay/test_pass_annotation.py

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
"""Unit tests for heterogeneous compilation and execution."""
import json
import numpy as np

This file has been truncated. show original

jwfromm · May 31, 2019, 11:14pm

That is what I was looking for, although I have a related question you might be able to help with. A lot of useful relay features rely on annotations that can be conveniently done from python. However, my main interaction with Relay is through a frontend graph parser that spits out an entire relay function. If I wanted to annotate certain operations in that function what’s the intended method? Do I need to write a full IR pass that includes C++ (similar to quantization)?

zhiics · May 17, 2019, 5:10pm

@jwfromm Yes, annotation at the expression level from the Relay source is a little annoying if you mainly work on the program obtained from the parser as the graph is usually large. But most of the annotations are for general purpose. If you have some special needs, one thing you can do is probably writing another pass. But again, you can still leverage the available operators and only the way you want to annotate the graph needs to be implemented. Hopefully this is helpful in some sense.

jroesch · May 17, 2019, 9:57pm

@jwfromm You can just write a simple pass that annotates the program in Python.

import tvm
from tvm import relay
import tvm.relay.testing
from tvm.relay.expr_functor import ExprMutator

class ScheduleConv2d(ExprMutator):
    def __init__(self, device):
        self.device = device
        super().__init__()

    def visit_call(self, expr):
        visit = super().visit_call(expr)
        if expr.op == tvm.relay.op.get("nn.conv2d"):
            return relay.annotation.on_device(visit, self.device)
        else:
            return visit

def schedule_conv2d_on_gpu(expr):
    sched = ScheduleConv2d(tvm.gpu(0))
    return sched.visit(expr)

resnet, params = relay.testing.resnet.get_workload()
print(resnet)
resnet = schedule_conv2d_on_gpu(resnet)
print(resnet)
resnet = relay.ir_pass.rewrite_annotated_ops(resnet, 0)
print(resnet)

jonso · May 30, 2019, 12:48am

I ran this example, then tried calling relay.build(resnet, ...) with opt_level=3.

I get the following exception:

TVMError: 103 out of 188 expressions are assigned with virtual device types. Either all or none of the expressions are expected to be annotated.

It seems that there is some assertion that all of the expressions should be annotated. If this is the case, what is the fallback device used for?

After modifying the code to annotate all expressions by returning relay.annotation.on_device(visit, tvm.cpu(0)) instead of visit in the else condition, I get the following exception:

AttributeError: '<class 'tvm.container.Map'>' object has no attribte 'values'

Any guidance would be appreciated!

zhiics · May 31, 2019, 12:15am

@jonso Please refer to

I added an example there.

jwfromm · June 20, 2019, 1:05am

@zhiics I have a follow up question to your commit.

I have a graph that has some custom ops that are only supported on CPU, but other operations will be faster on GPU. Naturally, device annotation is a great solution to this problem.

I pretty much have the exact code in your example except I annotate my custom ops on CPU and set the fallback_device to be gpu. However during relay.build I get an error saying my custom ops dont support the cuda schedule. Why is is it trying to schedule annotated ops on cuda? Does relay try to first schedule using the default then realize the annotation?

Here’s what my code looks like:

net, params = load_my_graph()

class ScheduleCustomOp(ExprMutator):
    def visit_call(self, expr):
        visit = super().visit_call(expr)
        if 'customop' in expr.op.name:
            return relay.annotation.on_device(visit, tvm.cpu())
        else:
             return visit

sched = ScheduleCustomOp()
net = sched.visit(net)
net = relay.ir_pass.infer_type(net)

target = {"gpu": "cuda", "cpu": "llvm"}

with relay.build_config(opt_level=3, fallback_device=tvm.gpu()):
    graph, lib, params = relay.build(net, target, params=params)

Running this code give the error:

RuntimeError: schedule not registered for 'cuda'

Looking at the annotated relay function, I see that my custom op is annotated as expected:

%100 customop(%99)
%101 = on_device(%100, meta[relay.attrs.OnDeviceAttrs][3])

Any thoughts on how to solve this problem?

zhiics · June 20, 2019, 4:14am

I am not very familiar with writing schedules, but I think you may haven’t registered a cuda schedule for your custom op, so that you cannot really generate code for it.

jwfromm · June 20, 2019, 5:16pm

If I annotate the op to run on CPU, why would it need a CUDA schedule? Does relay try to schedule on both targets before settling on the annotation target?

jwfromm · June 20, 2019, 6:38pm

It seems like there might be a small bug. What’s happening is I have two custom ops one after another, when I annotate both to be on CPU, the first attempts to be scheduled on cuda. If I only annotate the first, everything works fine. The second op is actually the output node of the graph as well so maybe this has something to do with it.

zhiics · June 20, 2019, 6:37pm

This looks like a bug. You don’t need cuda schedule if the op is assigned to CPU.

zhiics · June 23, 2019, 3:48am

It looks to me that the device type of the conv2d weights are not propagated correctly.

Can you try the following to see if it works?

diff --git a/src/relay/pass/device_annotation.cc b/src/relay/pass/device_annotation.cc
index 8eeb493..1645d75 100644
--- a/src/relay/pass/device_annotation.cc
+++ b/src/relay/pass/device_annotation.cc
@@ -491,8 +491,15 @@ class DeviceInfo {

   void FillPropagation(int out_dev_type) {
     for (const auto& it : post_visitor_.post_dfs_order_) {
-        Expr expr = GetRef<Expr>(it.first);
-        if (!it.second) device_map_.Set(expr, out_dev_type);
+      Expr expr = GetRef<Expr>(it.first);
+      if (!it.second) device_map_.Set(expr, out_dev_type);
+      if (const auto* call = expr.as<CallNode>()) {
+        for (const auto& arg : call->args) {
+          if (arg->is_type<VarNode>() || arg->is_type<ConstantNode>()) {
+            device_map_.Set(arg, device_map_[expr]);
+          }
+        }
+      }
     }
   }

If it works, could you please help debug where it goes wrong? Thanks.

jwfromm · June 24, 2019, 6:06pm

Your change did fix some of the issues I was having! Specifically, layers that use constant inputs now work great. I’m still running into some problems with convolution layers though. I’ll look into it a little more and see if I can figure out the difference.