How to serialize graph runtime (or relay.build() result)

Hi,

I’m looking deploy_resnet_on_vta.py and curious about a way to serialize and cache the each compilation result from relay.build().
I just want to avoid redundant repeat of graph_runtime generation process (It already takes almost 6 sec for VTA)

# It seems one-time costs from this ========================
relay.frontend.from_mxnet(gluon_model, shape)
relay.quantize(...)
...
graph, lib, params = relay.build(...)
...
m = graph_runtime.create(graph, lib, ctx)
# To this ============================================

m.set_input(**params)
m.set_input('data', image)
m.run()

I think the method like “graph_runtime.cache()” is a quite natural way to save latency and efficiently serve model inference.
Are these features in development? or am I just missed?

I also manually tried it with simple python library like pickle.
However, among those three components returned from relay.build() – graph, lib, params –
I’m not able to serialize “params”, which is a packed weight array originated from ctypes.

Is there any way to do this?

Thank you for reading,
OYH

I found a solution and log it for other people.

1 Like