Issue with VM execution when testing GPU immediately after CPU

@haichen @zhiics @jroesch

I’ve been working on separating dynamic ops from static ops. https://github.com/apache/incubator-tvm/pull/5826 The dynamic ops running with the Virtual Machine on either GPU or CPU without any issues, as long as I use only one backend.

I’m running into a bit of complication with this code though:

        for target, ctx in ctx_list():
            for kind in ["vm", "debug"]:
                mod = tvm.ir.IRModule.from_expr(func)
                intrp = relay.create_executor(kind, mod=mod, ctx=ctx, target=target)
                op_res = intrp.evaluate()(x_data, np.array(newshape))
                tvm.testing.assert_allclose(op_res.asnumpy(), ref_res, rtol=1e-5)

If I have a GPU in my system, that loop will run the test on CPU first, and then attempt to run on GPU. When it hits the GPU run, I get this error, it seems the VM thinks I should still be passing in CPU data:

TVMError: Check failed: ret == 0 (-1 vs. 0) : Assert fail: (1 == tvm_struct_get(arg0, 0, 10)), Argument arg0.device_type has an unsatisfied constraint

If I run it just on GPU, I don’t hit the error. I also don’t see the error with the Graph Runtime or the debug backend.

This looks like an issue with some global state inside the VirtualMachine, are you guys aware of anything that could cause this behavior?

Does adding compile_engine.get().clear() fix it?

Interesting, I haven’t seen this before. We actually have such test in the unit tests

No, this gets me the same error

    for target, ctx in ctx_list():
        for kind in ["vm", "debug"]:
            print(func)
            mod = tvm.ir.IRModule.from_expr(func)
            intrp = relay.create_executor(kind, mod=mod, ctx=ctx, target=target)
            op_res = intrp.evaluate()(*data)
            tvm.testing.assert_allclose(op_res.asnumpy(), ref_res, rtol=1e-5)
            relay.backend.compile_engine.get().clear()

@zhiics Hmm, I wonder if maybe it’s in my tvm.ir.IRModule.from_expr call

My measurement yesterday was apparently wrong, it does consistently fail on GPU (I mis-measured one test)

I’ve removed my new op from the test and reduced it to this function:

fn (%x: Tensor[(2, 3, 4), float32], %y: Tensor[(2), int64]) {
  reshape(%x, %y, newshape=None)
}

@haichen @zhiics does the VM not support dynamic shapes on the GPU backend?

I’m noticing that the dynamic reshape @haichen added here only tests the CPU:

@mbrookhart ahh, we only have CPU tests for dynamic ones. The reason is because shape functions only makes sense on CPU. Heterogeneous execution is need in order to make dynamic models work. We have some changes locally and we are working on gradually upstreaming them.

Okay, I’ll leave the unit tests on CPU for now. Look forward to that upstreaming :slight_smile:

Thanks!

I write a dynamic op and meet the same question, when do you plan to provide support about vm heterogeneous execution, do you have approximate time? look forward to you pr.