Modularize RPC Infra

RPC plays a key role in the TVM’s ecosystem by enabling remote profiling .

The current RPC contains two components: RPCSession that implements the server/client logic as well as parameter translation(tranlsate a local handle to a remote one), and RPCModule that exposes the session’s low-level remote calls into a form that is wrapped by tvm’s runtime data structure(e.g. PackedFunc).

While the current design serves most of our need, it has lacks certain limitations. In particular, the RPCSession contains several roles: (1) transporting the requests over a communication channel. (2) Translating the arguments(e.g. translate a remote.NDArray to its handle) (3) delegate remote calls to the runtime in the remote.

This RFC proposes to further modularize the RPC design into four major components:

  • RPCSession: represent a set of features that need to be implemented
  • RPCEndPont: End point that forwards the RPCSession requests over a communication channel to another remote RPCSession.
  • RPCChannel: The communication channel of two endpoints, currently we have socket, and we could add more channels such as stdio for cases that does not support networking.
  • RPCModule: Exposes an RPCSession as an rpc device in the TVM Runtime API.
 * \brief The interface of all remote RPC sessions.
 *  It contains all the necessary interface to implement
 *  remote call and resource management.
 *  The interface is designed to allow easy proxy-chaining
 *  by forward requests to another RPCSession.
class RPCSession {
  /*! \brief PackedFunc Handle in the remote. */
  using PackedFuncHandle = void*;

  /*! \brief Module handle in the remote. */
  using ModuleHandle = void*;

  /*! \brief NDArray handle in the remote. */
  using NDArrayHandle = void*;

   * \brief Callback to send an encoded return values via encode_args.
   * \param encode_args The arguments that we can encode the return values into.
   * \param ret_tcode The actual remote type code of the return value.
   * Encoding convention (as list of arguments):
   * - str/float/int/byte: [tcode: int, value: TVMValue] value follows PackedFunc convention.
   * - PackedFunc/Module: [tcode: int, handle: void*]
   * - NDArray: [tcode: int,  meta: DLTensor*, nd_handle: void*]
   *            DLTensor* contains the meta-data as well as handle into the remote data.
   *            nd_handle can be used for deletion.
  using FEncodeReturn = std::function<void(TVMArgs encoded_args)>;

  /*! \brief Destructor.*/
  virtual ~RPCSession() {}

   * \brief Get function in the session.
   * \param name The name of the function.
   * \return The function handle.
  virtual PackedFuncHandle GetFunction(const std::string& name) = 0;

   * \brief Call into a remote Packed function.
   *  Calling convention:
   *  - type_code is follows the PackedFunc convention.
   *  - int/float/string/bytes follows the PackedFunc convention, all data are local.
   *  - PackedFunc/Module and future remote objects: pass remote handle instead.
   *  - NDArray/DLTensor: pass a DLTensor pointer, the data field of DLTensor
   *                      points to a remote data handle returned by the Device API.
   *                      The meta-data of the DLTensor sits on local.
   *  The caller populates the arguments and manages these arguments.
   *  The callee can change the content of arg_values and arg_type_codes
   *  if they want to do inplace modify and forward.
   *  The callee need to store the return value into ret_value.
   *  - PackedFunc/Module are stored as void*
   *  - NDArray is stored as local NDArray, whose data field is a remote handle.
   *    Notably the NDArray's deleter won't delete remote handle.
   *    It is up to the user of the RPCSession to such wrapping.
   *  - In short, remote handles are "moved" as return values
   *    and the callee needs to explicitly manage them by calling
   *    the deleter functions when they are no longer needed.
   * \param func The function handle.
   * \param arg_values The argument values.
   * \param arg_type_codes the type codes of the argument.
   * \param num_args Number of arguments.
   * \param fencode_return The function to set the return value,
   *                       if not called, return value is null.
  virtual void CallFunc(PackedFuncHandle func,
                        const TVMValue* arg_values,
                        const int* arg_type_codes,
                        int num_args,
                        const FEncodeReturn& fencode_return) = 0;

   * \brief Copy bytes into remote array content.
   * \param local_from The source host data.
   * \param local_from_offset The byte offeset in the from.
   * \param remote_to The target array.
   * \param remote_to_offset The byte offset in the to.
   * \param nbytes The size of the memory in bytes.
   * \param remote_ctx_to The target context.
   * \param type_hint Hint of content data type.
  virtual void CopyToRemote(void* local_from,
                            size_t local_from_offset,
                            void* remote_to,
                            size_t remote_to_offset,
                            size_t nbytes,
                            TVMContext remote_ctx_to,
                            DLDataType type_hint) = 0;
   * \brief Copy bytes from remote array content.
   * \param remote_from The source host data.
   * \param remote_from_offset The byte offeset in the from.
   * \param to The target array.
   * \param to_offset The byte offset in the to.
   * \param nbytes The size of the memory in bytes.
   * \param remote_ctx_from The source context in the remote.
   * \param type_hint Hint of content data type.
  virtual void CopyFromRemote(void* remote_from,
                              size_t remote_from_offset,
                              void* local_to,
                              size_t local_to_offset,
                              size_t nbytes,
                              TVMContext remote_ctx_from,
                              DLDataType type_hint) = 0;

   * \brief Free a remote function.
   * \param handle The remote handle, can be NDArray/PackedFunc/Module
   * \param type_code The type code of the underlying type.
  virtual void FreeHandle(void* handle, int type_code) = 0;

   * \brief Get device API that represents the remote
   *  actions that can be taken on the remote.
   *  The caller can then call into the Alloc/Free functions
   *  to allocate free spaces and taking the pointer as the handle.
   *  The device API is guaranteed to be alive during the
   *  lifetime of the Session.
   * \param ctx The remote context.
   * \param allow_missing Whether can we return nullptr if it is not available.
   * \return The device API.
  virtual DeviceAPI* GetDeviceAPI(TVMContext ctx, bool allow_missing = false) = 0;

   * \return The session table index of the session.
  int table_index() const {
    return table_index_;

   * \brief Try get session from the global session table by table index.
   * \param table_index The table index of the session.
   * \return The shared_ptr to the session, can be nullptr.
  static std::shared_ptr<RPCSession> Get(int table_index);

  /*! \brief index of this session in RPC session table */
  int table_index_{0};
  /*! \brief Insert the current session to the session table.*/
  static void InsertToSessionTable(std::shared_ptr<RPCSession> sess);
  // friend declaration
  friend Module CreateRPCSessionModule(std::shared_ptr<RPCSession> sess);

RPCSession becomes a purely abstract interface that can be implemented by different providers:

  • The local machine is a special case (LocalSession) that deleges all calls into the local runtime.
  • The RPC client is another subclass (RPCClientSession) that delegates the session calls into the session provided by the other end of the RPCEndPoint.
  • The session provider in the RPC server can be configured by an session_constructor argument, which allows us to use a different session provider other than the remote machine.

The normal RPC communication path can now be summarized as follows.

client -> ClientSession -> EndPoint[client@n0]
-> networking[between n0 <=> n1]
-> EndPoint[server@n1] -> LocalSession[@n1]

Because of the new modular design, we can now chain more sessions together. For example, we can now run the following proxy setup

client_via_proxy = rpc.connect(                                                                                                                                                                     
      proxy_server_url, proxy_server_port, proxy_server_key,                                                                                                                                          
         "rpc.Connect", internal_url, internal_port, internal_key])      
client -> ClientSession -> Endpoint[client@n0]
-> networking[between n0 <=> n1]
-> Endpoint[server@n1] -> ClientSession -> Endpoint[client@n1]
-> networking[between n1 <=> n2]
-> Endpoint[server@n2] -> LocalSession[@n2]

We can also implement other types of Sessions. For example, we could make uTVM session a special case of the RPCSession and use the same mechanism for session management. We can also implement a stdio session that directly start a specified new program(as defined by the session constructor).


Here is an implementation of the above proposal.