Problem uploading kernel shared object to remote device via RPC

Hi folks, I’m working on getting RPC working with an internal Linux port for Hexagon as per my previous thread, and I’ve managed to successfully build the kernel as a shared object for uploading to the TVM runtime on my Hexagon Linux target.

I’m now running into an obscure error when I attempt to upload the kernel shared object to the target via RPC, I get the following error:

Traceback (most recent call last):
File “hexagon_linux_rpc.py”, line 87, in
remote.upload(path,remotepath)
File “/prj/dsp/qdsp6/austin/users/mariog/work/tvm/python/tvm/rpc/client.py”, line 86, in upload
self._remote_funcs[“upload”](target, blob)
File “/prj/dsp/qdsp6/austin/users/mariog/work/tvm/python/tvm/_ffi/_ctypes/function.py”, line 185, in call
ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
File “/prj/dsp/qdsp6/austin/users/mariog/work/tvm/python/tvm/_ffi/base.py”, line 72, in check_call
raise TVMError(py_str(_LIB.TVMGetLastError()))
tvm._ffi.base.TVMError: [16:03:39] /prj/dsp/qdsp6/austin/users/mariog/work/tvm/src/runtime/rpc/rpc_session.cc:942: Check failed: code == RPCCode::kReturn code=4

What could be going wrong in this case? Here’s the relevant portion of my script:

import numpy as np
import tvm
from tvm import rpc
from tvm.contrib import util
from tvm.contrib import cc

M = 512
K = 512
N = 512

tgt = 'llvm -target=hexagon-unknown-elf -mcpu=hexagonv60 -mattr=+hvx,+hvx-length64b -mattr=-small-data'
ctx = tvm.context(tgt, 0)
dtype = "int8"

a = tvm.nd.array(np.random.random_integers(low=-128, high=128, size=(M, K)).astype(dtype), ctx)
b = tvm.nd.array(np.random.random_integers(low=-128, high=128, size=(K, N)).astype(dtype), ctx)
c = tvm.nd.array(np.zeros((M, N), dtype=dtype), ctx)
k = tvm.reduce_axis((0, K), 'k')
A = tvm.placeholder((M, K), name='A', dtype=dtype)
B = tvm.placeholder((K, N), name='B', dtype=dtype)
C = tvm.compute(
           (M, N),
           lambda x, y: tvm.sum(A[x, k] * B[k, y], axis=k),
           name='C')

with tvm.build_config(data_alignment=64):
    bn = 64
    s = tvm.create_schedule(C.op)
    xo, yo, xi, yi = s[C].tile(C.op.axis[0], C.op.axis[1], bn, bn)
    k, = s[C].op.reduce_axis
    ko, ki = s[C].split(k, factor=16)
    s[C].reorder(xo, yo, ko, ki, xi, yi)
    s[C].vectorize(yi)
    f = tvm.build(s, [A, B, C], target=tgt, name='mmult')
    assert f
    temp = util.tempdir()
    path = temp.relpath("mmult.so")
    f.export_library(path,cc.create_shared,cc="/prj/qct/coredev/hexagon/austin/teams/hexagon-linux/builds/latest83/host/usr/bin/hexagon-linux-clang++")
    host = '10.222.142.41'
    port = 9090
    remote = rpc.connect(host, port)
    ctx = remote.cpu()
    remote.upload(path)  #<-- here's where it dies
    func = remote.load_module("mmult.so",ctx)

Thanks,
Mario

Is it possible to inspect that is happening on the the RPC device (e.g., do you have a shell that the RPC server can print messages to)? Unfortunately these types of errors are somewhat difficult to debug from the RPC client side.

One quick thing to try first is to just increase the RPC timeout if that is very aggressive and you are unsure of the expected run time of the uploaded module.

Hi, thanks for the reply. I do have access to a shell on the RPC device, how do I enable logging/printing of messages?
And, how do I increase the RPC timeout limit?

Ah, sorry I did not read the original post thoroughly enough. Timeout should not be the problem here if the failure occurs during upload.

You should be able to see the standard error messages on the device side in the shell that runs the RPC server. If no messages appear, then unfortunately we might need to manually instrument some of the existing RPC server code:

I see no messages in the terminal running the RPC shell beyond a one-line message indicating a connection from my host, so I can try instrumenting the RPC server. Which portions of the RPC server code do you think would be best to instrument for debugging purposes?

I would check to see if this part is successful when the client connects to upload a file:


whether the process handling the task(s) join correctly:

and if the upload function is run properly on the RPC Server’s end.

Note that the upload function is in C++, so instrumenting this will require recompiling the TVM runtime for RPC server.

Thanks for the tips, I’ll give these a try tomorrow.

1 Like