I was profiling the graph runtime for memory allocation and noticed that malloc() was being called inside tvm.contrib.graph_runtime.run()
. The distribution of allocated size during run()
is as follows:
- 0 - 1 KB: 366 times
- 1 - 5 KB: 20 times
- 15 - 20 KB: 18 times
- 600 - 700 KB: 1 time
- 800 - 900 KB: 1 time
Here is the call stack of the largest memory allocation, which allocates 844 KB:
/home/ubuntu/tvm/build/libtvm.so(tvm::runtime::CPUDeviceAPI::AllocDataSpace(DLContext, unsigned long, unsigned long, DLDataType)+0x43) [0x7f2325fd3add]
/home/ubuntu/tvm/build/libtvm.so(tvm::runtime::WorkspacePool::Pool::Alloc(DLContext, tvm::runtime::DeviceAPI*, unsigned long)+0x296) [0x7f2325ff27b8]
/home/ubuntu/tvm/build/libtvm.so(tvm::runtime::WorkspacePool::AllocWorkspace(DLContext, unsigned long)+0xf9) [0x7f2325ff228f]
/home/ubuntu/tvm/build/libtvm.so(tvm::runtime::CPUDeviceAPI::AllocWorkspace(DLContext, unsigned long, DLDataType)+0x32) [0x7f2325fd34b4]
/home/ubuntu/tvm/build/libtvm.so(TVMBackendAllocWorkspace+0x6d) [0x7f2325fd0099]
./model/compiled.so(+0x50e6) [0x7f2380f4b0e6]
/home/ubuntu/tvm/build/libtvm.so(+0x1e3d2f9) [0x7f2325fe12f9]
/home/ubuntu/tvm/build/libtvm.so(+0x1e3d6b0) [0x7f2325fe16b0]
/home/ubuntu/tvm/build/libtvm.so(std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0x5a) [0x7f23255571f6]
/home/ubuntu/tvm/build/libtvm.so(tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0x30) [0x7f23256802a6]
/home/ubuntu/tvm/build/libtvm.so(+0x1e82019) [0x7f2326026019]
/home/ubuntu/tvm/build/libtvm.so(+0x1e846bb) [0x7f23260286bb]
/home/ubuntu/tvm/build/libtvm.so(std::function<void ()>::operator()() const+0x32) [0x7f23254df52c]
/home/ubuntu/tvm/build/libtvm.so(tvm::runtime::GraphRuntime::Run()+0x79) [0x7f23260238ff]
/home/ubuntu/tvm/build/libtvm.so(+0x1e82ce0) [0x7f2326026ce0]
/home/ubuntu/tvm/build/libtvm.so(+0x1e84ef8) [0x7f2326028ef8]
/home/ubuntu/tvm/build/libtvm.so(std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0x5a) [0x7f23255571f6]
/home/ubuntu/tvm/build/libtvm.so(tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0x30) [0x7f23256802a6]
/home/ubuntu/tvm/build/libtvm.so(TVMFuncCall+0x95) [0x7f2325fd0225]
/usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f2329cabdae]
/usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x22f) [0x7f2329cab71f]
/usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2b4) [0x7f2329ebf524]
/usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x11b93) [0x7f2329ebfb93]
python3(_PyObject_FastCallKeywords+0x19c) [0x5a730c]
python3() [0x503073]
python3(_PyEval_EvalFrameDefault+0x449) [0x506859]
python3() [0x504c28]
Note that the compiled model lib (compiled.so) is making a call to TVMBackendAllocWorkspace()
, which then allocated memory via malloc()
. Since malloc()
is a system call, it may increase the inference latency.
The object dump of compiled.so
indeed show that this is the case:
2125: 48 8b 05 94 3e 23 00 mov 0x233e94(%rip),%rax # 235fc0 <__TVMBackendAllocWorkspace@@Base-0x88>
212c: bf 01 00 00 00 mov $0x1,%edi
2131: ba 2c 9a 09 00 mov $0x99a2c,%edx
2136: b9 02 00 00 00 mov $0x2,%ecx
213b: 41 b8 20 00 00 00 mov $0x20,%r8d
2141: 44 89 f6 mov %r14d,%esi
2144: ff 10 callq *(%rax)
We may want to pre-allocate all buffers to avoid malloc()
at inference time.
How to re-produce: run the following program with the memory tracing library https://github.com/qgears/log-malloc-simple:
import numpy as np
import PIL.Image
import tvm
import nnvm
import topi
input_shape = (1, 3, 224, 224)
with open('./frog.jpg', 'rb') as f:
image = PIL.Image.open(f).convert('RGB')
image = np.asarray(image.resize((input_shape[-2], input_shape[-1])), dtype=np.float32)
image = np.rollaxis(image, axis=2, start=0)[np.newaxis, :]
with open('./model/compiled_model.json', 'r') as f:
graph = f.read()
lib = tvm.module.load('./model/compiled.so')
with open('./model/compiled.params', 'rb') as f:
params = nnvm.compiler.load_param_dict(f.read())
ctx = tvm.cpu(0)
m = tvm.contrib.graph_runtime.create(graph, lib, ctx)
m.set_input(**params)
m.set_input('data', tvm.nd.array(image, ctx=ctx))
m.run()
out = m.get_output(0, tvm.nd.empty(shape=(1,1000))).asnumpy()
cc @kevinthesun