How to use heterogeneous execution?

Hi all. I tried heterogeneous execution with the following code, based on tests/python/relay/test_pass_annotation.py.

import tvm
import tvm.relay as relay
import numpy as np

R""" The network is as following:                                                                                                                         
           x     y                                                                                                                                        
            \   /                                                                                                                                         
             add                                                                                                                                          
            /   \                                                                                                                                         
         sqrt   log                                                                                                                                       
            \   /                                                                                                                                         
          subtract                                                                                                                                        
              |                                                                                                                                           
             exp                                                                                                                                          
"""

fallback_device = tvm.context("cpu")
target = {"cpu": "llvm", "cuda": "cuda"}
dev_ctx = tvm.context("cuda")
cpu_ctx = fallback_device

x = relay.var("x", shape=(1, 10))
y = relay.var("y", shape=(10, 10))
add = relay.add(x, y)
sqrt = relay.sqrt(add)
_sqrt = relay.annotation.on_device(sqrt, dev_ctx)
log = relay.log(add)
subtract = relay.subtract(sqrt, log)
exp = relay.exp(subtract)
_exp = relay.annotation.on_device(exp, dev_ctx)

func = relay.Function([x, y], relay.Tuple(tvm.convert([_sqrt, _exp, exp])))
func = relay.ir_pass.infer_type(func)
func = relay.ir_pass.rewrite_annotated_ops(func, cpu_ctx.device_type)
func = relay.ir_pass.infer_type(func)
func = relay.Function(relay.ir_pass.free_vars(func.body[2]), func.body[2])
print(func)   

x_data = np.random.rand(1, 10).astype('float32')
y_data = np.random.rand(10, 10).astype('float32')
params = {"x": x_data, "y": y_data}

with relay.build_config(opt_level=1):
    graph, lib, params = relay.build(func, target=target, params=params)

module = tvm.contrib.graph_runtime.create(graph, lib, [cpu_ctx, dev_ctx])
module.set_input(**params)
module.run()
module.get_output(0).asnumpy()

What I expected was:

  • CPU executes ‘add’, ‘log’, and ‘subtract’.
  • GPU executes ‘sqrt’ and ‘exp’.

However, the result was all the operators are executed on GPU.

$ nvprof --print-gpu-summary python hetero_execution.py
fn (%x: Tensor[(1, 10), float32],
    %y: Tensor[(10, 10), float32]) {
  %0 = add(%x, %y) # ty=Tensor[(10, 10), float32]
  %1 = device_copy(%0, meta[relay.attrs.DeviceCopyAttrs][0]) # ty=Tensor[(10, 10), float32]
  %2 = sqrt(%1) # ty=Tensor[(10, 10), float32]
  %3 = device_copy(%2, meta[relay.attrs.DeviceCopyAttrs][1]) # ty=Tensor[(10, 10), float32]
  %4 = log(%0) # ty=Tensor[(10, 10), float32]
  %5 = subtract(%3, %4) # ty=Tensor[(10, 10), float32]
  %6 = device_copy(%5, meta[relay.attrs.DeviceCopyAttrs][2]) # ty=Tensor[(10, 10), float32]
  %7 = exp(%6) # ty=Tensor[(10, 10), float32]
  %7          
}             
# meta data omitted. you can use show_meta_data=True to include meta-data
              
==19110== NVPROF is profiling process 19110, command: python hetero_execution.py
==19110== Profiling application: python hetero_execution.py
==19110== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   26.90%  4.8640us         3  1.6210us  1.3120us  2.0800us  [CUDA memcpy DtoD]
                   19.12%  3.4560us         1  3.4560us  3.4560us  3.4560us  fused_add_kernel0
                   15.58%  2.8160us         2  1.4080us  1.2160us  1.6000us  [CUDA memcpy HtoD]
                   11.33%  2.0480us         1  2.0480us  2.0480us  2.0480us  fused_log_subtract_kernel0
                   10.62%  1.9200us         1  1.9200us  1.9200us  1.9200us  fused_sqrt_kernel0
                    8.50%  1.5360us         1  1.5360us  1.5360us  1.5360us  fused_exp_kernel0
                    7.96%  1.4400us         1  1.4400us  1.4400us  1.4400us  [CUDA memcpy DtoH]

It this a bug or am I missing something?

I am not sure that this insight helps resolving issue, it seems that this happens when a func passes ir_pass.fuse_ops (called in https://github.com/dmlc/tvm/blob/master/python/tvm/relay/build_module.py#L269).

import tvm
import tvm.relay as relay
import numpy as np

fallback_device = tvm.context("cpu")
target = {"cpu": "llvm", "cuda": "cuda"}
dev_ctx = tvm.context("cuda")
cpu_ctx = fallback_device

x = relay.var("x", shape=(1, 10))
y = relay.var("y", shape=(10, 10))
add = relay.add(x, y)
sqrt = relay.sqrt(add)
_sqrt = relay.annotation.on_device(sqrt, dev_ctx)
log = relay.log(add)
subtract = relay.subtract(sqrt, log)
exp = relay.exp(subtract)
_exp = relay.annotation.on_device(exp, dev_ctx)

func = relay.Function([x, y], relay.Tuple(tvm.convert([_sqrt, _exp, exp])))
func = relay.ir_pass.infer_type(func)
func = relay.ir_pass.rewrite_annotated_ops(func, cpu_ctx.device_type)
func = relay.ir_pass.infer_type(func)
func = relay.Function(relay.ir_pass.free_vars(func.body[2]), func.body[2])

##Storage_device_info seems correct
Storage_device_info = tvm.relay.backend._backend.GraphPlanMemory(func)
for k,[[x],[y]] in Storage_device_info.items():
   print k
   print y
free_var %x: Tensor[(1, 10), float32]
free_var %y: Tensor[(10, 10), float32]
%0 = add(%x, %y) # ty=Tensor[(10, 10), float32]
%0

1
free_var %x: Tensor[(1, 10), float32]
free_var %y: Tensor[(10, 10), float32]
%0 = add(%x, %y) # ty=Tensor[(10, 10), float32]
%1 = device_copy(%0, meta[relay.attrs.DeviceCopyAttrs][0]) # ty=Tensor[(10, 10), float32]
%2 = sqrt(%1) # ty=Tensor[(10, 10), float32]
%3 = device_copy(%2, meta[relay.attrs.DeviceCopyAttrs][1]) # ty=Tensor[(10, 10), float32]
%4 = log(%0) # ty=Tensor[(10, 10), float32]
%5 = subtract(%3, %4) # ty=Tensor[(10, 10), float32]
%6 = device_copy(%5, meta[relay.attrs.DeviceCopyAttrs][2]) # ty=Tensor[(10, 10), float32]
%6
# meta data omitted. you can use show_meta_data=True to include meta-data

2
free_var %y: Tensor[(10, 10), float32]
%y

1
free_var %x: Tensor[(1, 10), float32]
free_var %y: Tensor[(10, 10), float32]
%0 = add(%x, %y) # ty=Tensor[(10, 10), float32]
%1 = device_copy(%0, meta[relay.attrs.DeviceCopyAttrs][0]) # ty=Tensor[(10, 10), float32]
%1
# meta data omitted. you can use show_meta_data=True to include meta-data

2
free_var %x: Tensor[(1, 10), float32]
%x

1
free_var %x: Tensor[(1, 10), float32]
free_var %y: Tensor[(10, 10), float32]
%0 = add(%x, %y) # ty=Tensor[(10, 10), float32]
%1 = device_copy(%0, meta[relay.attrs.DeviceCopyAttrs][0]) # ty=Tensor[(10, 10), float32]
%2 = sqrt(%1) # ty=Tensor[(10, 10), float32]
%3 = device_copy(%2, meta[relay.attrs.DeviceCopyAttrs][1]) # ty=Tensor[(10, 10), float32]
%4 = log(%0) # ty=Tensor[(10, 10), float32]
%5 = subtract(%3, %4) # ty=Tensor[(10, 10), float32]
%5
# meta data omitted. you can use show_meta_data=True to include meta-data

1
free_var %x: Tensor[(1, 10), float32]
free_var %y: Tensor[(10, 10), float32]
%0 = add(%x, %y) # ty=Tensor[(10, 10), float32]
%1 = device_copy(%0, meta[relay.attrs.DeviceCopyAttrs][0]) # ty=Tensor[(10, 10), float32]
%2 = sqrt(%1) # ty=Tensor[(10, 10), float32]
%3 = device_copy(%2, meta[relay.attrs.DeviceCopyAttrs][1]) # ty=Tensor[(10, 10), float32]
%3
# meta data omitted. you can use show_meta_data=True to include meta-data

1
free_var %x: Tensor[(1, 10), float32]
free_var %y: Tensor[(10, 10), float32]
%0 = add(%x, %y) # ty=Tensor[(10, 10), float32]
%1 = device_copy(%0, meta[relay.attrs.DeviceCopyAttrs][0]) # ty=Tensor[(10, 10), float32]
%2 = sqrt(%1) # ty=Tensor[(10, 10), float32]
%2
# meta data omitted. you can use show_meta_data=True to include meta-data

2
free_var %x: Tensor[(1, 10), float32]
free_var %y: Tensor[(10, 10), float32]
%0 = add(%x, %y) # ty=Tensor[(10, 10), float32]
%1 = device_copy(%0, meta[relay.attrs.DeviceCopyAttrs][0]) # ty=Tensor[(10, 10), float32]
%2 = sqrt(%1) # ty=Tensor[(10, 10), float32]
%3 = device_copy(%2, meta[relay.attrs.DeviceCopyAttrs][1]) # ty=Tensor[(10, 10), float32]
%4 = log(%0) # ty=Tensor[(10, 10), float32]
%5 = subtract(%3, %4) # ty=Tensor[(10, 10), float32]
%6 = device_copy(%5, meta[relay.attrs.DeviceCopyAttrs][2]) # ty=Tensor[(10, 10), float32]
%7 = exp(%6) # ty=Tensor[(10, 10), float32]
%7
# meta data omitted. you can use show_meta_data=True to include meta-data

2
free_var %x: Tensor[(1, 10), float32]
free_var %y: Tensor[(10, 10), float32]
%0 = add(%x, %y) # ty=Tensor[(10, 10), float32]
%1 = log(%0) # ty=Tensor[(10, 10), float32]
%1

1
func = relay.ir_pass.infer_type(func)
func = relay.ir_pass.fuse_ops(func, 1)
func = relay.ir_pass.infer_type(func)

#Storage_device_info seems wrong(only free vars are on 1)
Storage_device_info = tvm.relay.backend._backend.GraphPlanMemory(func)
for k,[[x],[y]] in Storage_device_info.items():
    print k
    print y
free_var %x: Tensor[(1, 10), float32]
free_var %y: Tensor[(10, 10), float32]
%0 = fn(%p0: Tensor[(1, 10), float32],
        %p1: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %1 = add(%p0, %p1) # ty=Tensor[(10, 10), float32]
  %1
}
%2 = %0(%x, %y) # ty=Tensor[(10, 10), float32]
%3 = fn(%p01: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %4 = device_copy(%p01, meta[relay.attrs.DeviceCopyAttrs][0]) # ty=Tensor[(10, 10), float32]
  %4
}
%5 = %3(%2) # ty=Tensor[(10, 10), float32]
%5
# meta data omitted. you can use show_meta_data=True to include meta-data

2
free_var %x: Tensor[(1, 10), float32]
free_var %y: Tensor[(10, 10), float32]
%0 = fn(%p0: Tensor[(1, 10), float32],
        %p1: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %1 = add(%p0, %p1) # ty=Tensor[(10, 10), float32]
  %1
}
%2 = %0(%x, %y) # ty=Tensor[(10, 10), float32]
%2

2
free_var %x: Tensor[(1, 10), float32]
free_var %y: Tensor[(10, 10), float32]
%0 = fn(%p0: Tensor[(1, 10), float32],
        %p1: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %1 = add(%p0, %p1) # ty=Tensor[(10, 10), float32]
  %1
}
%2 = %0(%x, %y) # ty=Tensor[(10, 10), float32]
%3 = fn(%p01: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %4 = device_copy(%p01, meta[relay.attrs.DeviceCopyAttrs][0]) # ty=Tensor[(10, 10), float32]
  %4
}
%5 = %3(%2) # ty=Tensor[(10, 10), float32]
%6 = fn(%p02: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %7 = sqrt(%p02) # ty=Tensor[(10, 10), float32]
  %7
}
%8 = %6(%5) # ty=Tensor[(10, 10), float32]
%9 = fn(%p03: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %10 = device_copy(%p03, meta[relay.attrs.DeviceCopyAttrs][1]) # ty=Tensor[(10, 10), float32]
  %10
}
%11 = %9(%8) # ty=Tensor[(10, 10), float32]
%12 = fn(%p04: Tensor[(10, 10), float32],
         %p11: Tensor[(10, 10), float32])
         -> Tensor[(10, 10), float32] {
  %13 = log(%p11) # ty=Tensor[(10, 10), float32]
  %14 = subtract(%p04, %13) # ty=Tensor[(10, 10), float32]
  %14
}
%15 = %12(%11, %2) # ty=Tensor[(10, 10), float32]
%16 = fn(%p05: Tensor[(10, 10), float32])
         -> Tensor[(10, 10), float32] {
  %17 = device_copy(%p05, meta[relay.attrs.DeviceCopyAttrs][2]) # ty=Tensor[(10, 10), float32]
  %17
}
%18 = %16(%15) # ty=Tensor[(10, 10), float32]
%19 = fn(%p06: Tensor[(10, 10), float32])
         -> Tensor[(10, 10), float32] {
  %20 = exp(%p06) # ty=Tensor[(10, 10), float32]
  %20
}
%21 = %19(%18) # ty=Tensor[(10, 10), float32]
%21
# meta data omitted. you can use show_meta_data=True to include meta-data

2
free_var %y: Tensor[(10, 10), float32]
%y

1
free_var %x: Tensor[(1, 10), float32]
free_var %y: Tensor[(10, 10), float32]
%0 = fn(%p0: Tensor[(1, 10), float32],
        %p1: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %1 = add(%p0, %p1) # ty=Tensor[(10, 10), float32]
  %1
}
%2 = %0(%x, %y) # ty=Tensor[(10, 10), float32]
%3 = fn(%p01: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %4 = device_copy(%p01, meta[relay.attrs.DeviceCopyAttrs][0]) # ty=Tensor[(10, 10), float32]
  %4
}
%5 = %3(%2) # ty=Tensor[(10, 10), float32]
%6 = fn(%p02: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %7 = sqrt(%p02) # ty=Tensor[(10, 10), float32]
  %7
}
%8 = %6(%5) # ty=Tensor[(10, 10), float32]
%9 = fn(%p03: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %10 = device_copy(%p03, meta[relay.attrs.DeviceCopyAttrs][1]) # ty=Tensor[(10, 10), float32]
  %10
}
%11 = %9(%8) # ty=Tensor[(10, 10), float32]
%11
# meta data omitted. you can use show_meta_data=True to include meta-data

2
free_var %x: Tensor[(1, 10), float32]
free_var %y: Tensor[(10, 10), float32]
%0 = fn(%p0: Tensor[(1, 10), float32],
        %p1: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %1 = add(%p0, %p1) # ty=Tensor[(10, 10), float32]
  %1
}
%2 = %0(%x, %y) # ty=Tensor[(10, 10), float32]
%3 = fn(%p01: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %4 = device_copy(%p01, meta[relay.attrs.DeviceCopyAttrs][0]) # ty=Tensor[(10, 10), float32]
  %4
}
%5 = %3(%2) # ty=Tensor[(10, 10), float32]
%6 = fn(%p02: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %7 = sqrt(%p02) # ty=Tensor[(10, 10), float32]
  %7
}
%8 = %6(%5) # ty=Tensor[(10, 10), float32]
%9 = fn(%p03: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %10 = device_copy(%p03, meta[relay.attrs.DeviceCopyAttrs][1]) # ty=Tensor[(10, 10), float32]
  %10
}
%11 = %9(%8) # ty=Tensor[(10, 10), float32]
%12 = fn(%p04: Tensor[(10, 10), float32],
         %p11: Tensor[(10, 10), float32])
         -> Tensor[(10, 10), float32] {
  %13 = log(%p11) # ty=Tensor[(10, 10), float32]
  %14 = subtract(%p04, %13) # ty=Tensor[(10, 10), float32]
  %14
}
%15 = %12(%11, %2) # ty=Tensor[(10, 10), float32]
%16 = fn(%p05: Tensor[(10, 10), float32])
         -> Tensor[(10, 10), float32] {
  %17 = device_copy(%p05, meta[relay.attrs.DeviceCopyAttrs][2]) # ty=Tensor[(10, 10), float32]
  %17
}
%18 = %16(%15) # ty=Tensor[(10, 10), float32]
%18
# meta data omitted. you can use show_meta_data=True to include meta-data

2
free_var %x: Tensor[(1, 10), float32]
free_var %y: Tensor[(10, 10), float32]
%0 = fn(%p0: Tensor[(1, 10), float32],
        %p1: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %1 = add(%p0, %p1) # ty=Tensor[(10, 10), float32]
  %1
}
%2 = %0(%x, %y) # ty=Tensor[(10, 10), float32]
%3 = fn(%p01: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %4 = device_copy(%p01, meta[relay.attrs.DeviceCopyAttrs][0]) # ty=Tensor[(10, 10), float32]
  %4
}
%5 = %3(%2) # ty=Tensor[(10, 10), float32]
%6 = fn(%p02: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %7 = sqrt(%p02) # ty=Tensor[(10, 10), float32]
  %7
}
%8 = %6(%5) # ty=Tensor[(10, 10), float32]
%8
# meta data omitted. you can use show_meta_data=True to include meta-data

2
free_var %x: Tensor[(1, 10), float32]
free_var %y: Tensor[(10, 10), float32]
%0 = fn(%p0: Tensor[(1, 10), float32],
        %p1: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %1 = add(%p0, %p1) # ty=Tensor[(10, 10), float32]
  %1
}
%2 = %0(%x, %y) # ty=Tensor[(10, 10), float32]
%3 = fn(%p01: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %4 = device_copy(%p01, meta[relay.attrs.DeviceCopyAttrs][0]) # ty=Tensor[(10, 10), float32]
  %4
}
%5 = %3(%2) # ty=Tensor[(10, 10), float32]
%6 = fn(%p02: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %7 = sqrt(%p02) # ty=Tensor[(10, 10), float32]
  %7
}
%8 = %6(%5) # ty=Tensor[(10, 10), float32]
%9 = fn(%p03: Tensor[(10, 10), float32])
        -> Tensor[(10, 10), float32] {
  %10 = device_copy(%p03, meta[relay.attrs.DeviceCopyAttrs][1]) # ty=Tensor[(10, 10), float32]
  %10
}
%11 = %9(%8) # ty=Tensor[(10, 10), float32]
%12 = fn(%p04: Tensor[(10, 10), float32],
         %p11: Tensor[(10, 10), float32])
         -> Tensor[(10, 10), float32] {
  %13 = log(%p11) # ty=Tensor[(10, 10), float32]
  %14 = subtract(%p04, %13) # ty=Tensor[(10, 10), float32]
  %14
}
%15 = %12(%11, %2) # ty=Tensor[(10, 10), float32]
%15
# meta data omitted. you can use show_meta_data=True to include meta-data

2
free_var %x: Tensor[(1, 10), float32]
%x

1

It looks this is a bug. I will look into it. Thanks for reporting it.

Hi @kazum @zhiics
I know most of the code is from the tests/python/relay/test_pass_annotation.py but I dont really understand the intuition of how the code does what it is supposed to. Would you mind commenting on my questions?

x = relay.var("x", shape=(1, 10))
y = relay.var("y", shape=(10, 10))
add = relay.add(x, y)
sqrt = relay.sqrt(add)
_sqrt = relay.annotation.on_device(sqrt, dev_ctx)
log = relay.log(add)
subtract = relay.subtract(sqrt, log)
  1. That last line, why isn’t it subtract = relay.subtract(_sqrt,log) ?? I mean I would guess that _sqrt is just a copy of the original one with the added annotation. So why not give it as input to subtract?

  2. What does this line do func = relay.Function([x, y], relay.Tuple(tvm.convert([_sqrt, _exp, exp]))) ? Is this basically saying “replace the copies of _sqrtand _exp in the graph with output exp and create a new function”?

  3. Why do you call relay.ir_pass.rewrite_annotated_ops(func, cpu_ctx.device_type) and not relay.ir_pass.rewrite_annotated_ops(func, dev_ctx.device_type)? My intuition would have been that rewrite_annoted_ops would require the dev_ctx device and not the cpu_ctx

  4. How does relay.build() handle the fact that now target is a dictionary?

Thanks a lot :slight_smile:

@kazum, @imorinaga The problem is because I stepped into the fused op and append them to the post_dfs_order list. I should haven’t done that because we should only consider the call nodes and check the copy_copy node in the callee function. I will fix it. Sorry for the inconvenience.

Best,
Zhi

2 Likes

@aca88 Thanks for asking.

  1. Yes, we can do it as the way you are mentioning here and replace annotation nodes with copy nodes, but users have to re-connect the AST manually. It might not be convenient when the network is large. So I let users to annotate the expr and reconnect it in the program later.

  2. This is related to your question 1. Yes, if we do it as the way you mentioned above we can rewrite the program instead of annotating it. Otherwise, we need to pass the annotation nodes to make sure we can collect them when we traverse the tree from the exit node.

  3. The second argument in rewrite_annotated_ops is the fallback device type. It could be any device. In the example, I let the nodes that are not specifically annotated fall back to the cpu.

  4. please refer to the code here: https://github.com/dmlc/tvm/blob/master/python/tvm/relay/build_module.py#L264
    Only the target parameter in the build interface is changed. The fallback is passed through config.

Hopefully this helps.

1 Like

Thanks for the fast reply :slight_smile:

Yes this realy helped a lot.

I will try to look into the code and see if I can find any other questions

@zhiics, thanks for your fix in https://github.com/dmlc/tvm/pull/2622. It looks like solving the problem. :slight_smile:

I have another question. Can we remove “relay.ir_pass.rewrite_annotated_ops” from my example? I think relay.build will apply the pass at https://github.com/dmlc/tvm/blob/master/python/tvm/relay/build_module.py#L352, but the code looks not working as expected.

I tried the following code and nvprof said that GPU is not used at all.

import tvm
import tvm.relay as relay
import numpy as np

fallback_device = tvm.context("cpu")
target = {"cpu": "llvm", "cuda": "cuda"}
dev_ctx = tvm.context("cuda")
cpu_ctx = fallback_device

x = relay.var("x", shape=(1, 10))
y = relay.var("y", shape=(10, 10))
add = relay.add(x, y)
sqrt = relay.sqrt(add)
_sqrt = relay.annotation.on_device(sqrt, dev_ctx)
log = relay.log(add)
subtract = relay.subtract(sqrt, log)
exp = relay.exp(subtract)
_exp = relay.annotation.on_device(exp, dev_ctx)

func = relay.Function([x, y], relay.Tuple(tvm.convert([_sqrt, _exp, exp])))
func = relay.Function(relay.ir_pass.free_vars(func.body[2]), func.body[2])

x_data = np.random.rand(1, 10).astype('float32')
y_data = np.random.rand(10, 10).astype('float32')
params = {"x": x_data, "y": y_data}

with relay.build_config(opt_level=1):
   graph, lib, params = relay.build(func, target=target, params=params)

module = tvm.contrib.graph_runtime.create(graph, lib, [cpu_ctx, dev_ctx])
module.set_input(**params)
module.run()
module.get_output(0).asnumpy()

Yes, this is actually related to the question @aca88 asked. If you pass it that way without connecting the node. The annotation nodes going away. You can probably print the func before you call build. You will see it is the same as the original graph. Or you can probably add:

func = ir_pass.infer_type(func)
func = expr.Function(ir_pass.free_vars(func.bo  dy[-1]), func.body[-1])
# or in a better way if you only have one output:
func = ir_pass.infer_type(func)
if isinstance(func.body, (list, tuple, expr.Tuple):
    func = expr.Function(ir_pass.free_vars(func.bo  dy[-1]), func.body[-1])

before device_map at line 355 in https://github.com/dmlc/tvm/blob/master/python/tvm/relay/build_module.py,
and you can then pass func = relay.Function([x, y], relay.Tuple(tvm.convert([_sqrt, _exp, exp]))). I think then you are the same. The reason I didn’t put these two lines there is because this might affect when you have multiple outputs. I planned to have PR to support this but I didn’t have much time recently. You are welcome to work on it if you are interested.

@zhiics, thanks for your explanation. I’ve understood how heterogeneous execution is implemented in TVM. The current flow to use heterogeneous execution looks reasonable to me. :slight_smile:

Thanks a lot!

1 Like