How could we request a inference synchronously?

Hello,

I have tested some inferences - squeezenet1.1, resnet18_v1 and inceptionv3 using MALI GPU, and measured the performance, and compared CPU and GPU performance.

While measuring the performance on GPU, I found out that GPU operations aren’t completed at run().
Instead, it seems the operations are completed at TVMArrayCopyFromTo(gpu_y, cpu_y,. .).

Is there any API to make sure to wait for the completion of the all GPU operations?

Thanks,
Inki Dae

you can use ctx.sync().

Thanks for answer. :slight_smile:

Thanks,
Inki Dae

BTW, I used c++ code on device so I cannot use ctx.sync(). Is there c++ based sync API?
I see TVMSynchronize function but it seems creating runtime stream is required. Is there any example about this?

Thanks,
Inki Dae

You can check what ctx.sync() does, here.

So you should be able to pass a null pointer to TVMSynchronize on C++ side too. Or you can always use the OpenCL runtime api directly.