I want to deploy inference of a tvm optimized tensorflow model on linux gpu server. which support http service . Is there a framework to load tvm exported model like tensorrt-inference-server?
You may load tvm model with Python and start the HTTP service to serve it.
We are thinking about adding TensorFlow custom op to run tvm op so that you can run all the op with TensorFlow session which has serving service like TensorFlow Serving and simple_tensorflow_serving.
Related proposal in Add TensorFlow custom op and run tvm in TensorFlow .