Is it possible to embed TVM into Tensorflow Serving?

daweili1226 · February 19, 2019, 6:59am

srkreddy1238 · February 19, 2019, 8:21am

I attempted and succeeded with an initial version.

daweili1226 · February 19, 2019, 8:33am

Awesome! Can you please share a tutorial? Really appreciated.

FrozenGene · March 1, 2019, 6:47am

I am also interested in it. This could be very nice if we are in cloud server environment.

srkreddy1238 · March 5, 2019, 6:27am

@daweili1226 & @FrozenGene
you guys may need to wait a while as I am checking on the approvals to upstream it.

lanyastar · March 27, 2019, 8:09am

Hi @srkreddy1238
How is it going with tvm into tf serving?

srkreddy1238 · March 28, 2019, 12:58am

Good to go now.

Will upstream in a week or 10 days.

daweili1226 · March 28, 2019, 1:10am

Thank you so much, good news at the start of my day!

srkreddy1238 · April 6, 2019, 11:39am

@daweili1226 checkout the below branch for TVM on tf serving. This more likely an initial version, some standardisation and enhancements to be done.

Find below a python notebook to demonstrate the same.

github.com

srkreddy1238/serving/blob/master/tensorflow_serving/g3doc/tutorials/Serving_TVM_REST_simple.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "MhoQ0WE77laV"
   },
   "source": [
    "##### Copyright 2018 The TensorFlow Authors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "cellView": "form",
    "colab": {},
    "colab_type": "code",
    "id": "_ckMIh7O7s6D"

This file has been truncated. show original

yangjunpro · April 7, 2019, 9:30am

Thanks for the nice sharing.

@srkreddy1238

Is there any performance benchmarks regarding to this TVM integration into TensorFlow serving?

Also I am wondering whether there is any apple-to-apple comparison between this integration and TF XLA backend support in your serving scenario?

srkreddy1238 · April 8, 2019, 2:54am

I think TF serving framework doesn’t increase or decrease performance numbers compared to individual TVM vs TF numbers.

This is basic version of TVM Servable into TF Serving verified for CPU, I think the context is some thing to be passed through configuration for other backends.

FrozenGene · April 8, 2019, 6:20am

I think this is the inference performance between TF-XLA and TVM. Not TF Serving problem. The performance number could be seen: https://arxiv.org/pdf/1802.04799.pdf

gasgallo · January 24, 2020, 9:44am

@srkreddy1238 how to build tf-serving with TVM support?

In your example, you simply call tf_model_server binary, how did you build it? And, eventually, is it possible to use a docker image instead?

heliqi · February 25, 2020, 2:50am

Thanks for the nice sharing. @srkreddy1238

Does the commit depend on other your code？ Can I merge it into my TF service code and compile it ?

srkreddy1238 · February 25, 2020, 9:46am

The binary is built by patching the above change to tensorflow_serving. Yes, we can build docker image out of the box as TF serving already support it.

srkreddy1238 · February 25, 2020, 9:47am

Yes, this patch should work on TF Serving 1.12 directly.

heliqi · February 25, 2020, 9:49am

TVM version must be 0.1？ it could be 0.6 or the latest version？

srkreddy1238 · February 25, 2020, 9:57am

I think TVM version that time was 0.5. But I don’t think graph runtime has got major new changes which is the only dependency for TF Serving. You may try 0.6 first.

TF Serving also advanced to latest versions now, I may try upgrade later.

heliqi · February 27, 2020, 10:24am

Thank you very much! I try it.

If there is any result, I will give you the feedback.

#-------------------------------

I successfully load the TVM Model with TF Serving.

The TVM Version must be v0.5.I think the problem is in the graph-metadata-API.diff file.

When I have time, I try to change the code to support the latest version.

srkreddy1238 · February 26, 2020, 3:56am

You may try the path failures manually or fall back to 0.5 version.