Can Virtual Machine Run with dynamic input and CUDA target?

lsy643 · July 14, 2020, 9:00am

Hi:

Theoretically, Relay Virtual Machine can be used to handle model with dynamic operators. However, when the target of vm.compile is cuda, the virtual machine can not run correctly.

It seems that the shape function need to run on llvm even though the target is set as cuda, so is there a way to make virtual machine for cuda target work?

The error logs

TVMError: Check failed: ret == 0 (-1 vs. 0) : Assert fail: (1 == tir.tvm_struct_get(arg1, 0, 10)), Argument arg1.device_type has an unsatisfied constraint

This can be reproduced

import tvm
from tvm import relay
import tensorflow as tf
import numpy as np

def main():
    debug_graph = tf.Graph()
    with debug_graph.as_default():
        input_1 = tf.placeholder(dtype=tf.int32, shape=[None], name='input_1')
        result = tf.nn.relu(input_1, name='result')

    layout = "NHWC"
    mod, params = relay.frontend.from_tensorflow(
        debug_graph.as_graph_def(),
        layout=layout,
        outputs=['result']
    )

    target = "cuda"
    context = tvm.gpu()

    exe = relay.vm.compile(mod, target=target, params=params)
    des_vm = tvm.runtime.vm.VirtualMachine(exe)
    des_vm.init(context)
    in_data = np.array([-1, 2, -3, 4, -5], dtype=np.int32)
    ret = des_vm.run(in_data)
    print('result: ', ret)

if __name__ == '__main__':
    main()

lsy643 · July 16, 2020, 9:15am

Currently, dynamic input and CUDA device can not work together, even though only one target is set for compilation. The reason why this happens is that shape functions are built on cpu, but the output tensors for shape functions are located on gpu if the virtual machine is init with tvm.gpu().

I make a PR to try to fix this bug, but I am sure whether this is reasonable solution for the long term. Any suggestion is welcomed.

comaniac · July 16, 2020, 4:26pm

@haichen you might be interested in this.

haichen · July 18, 2020, 5:31am

@lsy643 Yes, you’re correct. Current VM doesn’t support to run on CUDA (or just lead to poor performance) because of the reason as you mentioned that shape functions need to run on CPU instead of GPU. We’re working on the support of heterogeneous execution in VM, and probably will send a PR in next one to two weeks.

cc @zhiics

lsy643 · July 20, 2020, 1:55am

@haichen That would be very nice, and I am highly looking forward to heterogeneous execution in VM.