Tvm 's detection speed is so slower than Mxnet on SSD-Mobilenet

zzw · April 25, 2019, 9:05am

First, I run the following codes on rk3399 for object detection:
https://gluon-cv.mxnet.io/build/examples_detection/demo_webcam.html

it’s detection speed is about 10fps, just including frame capture, data transform, and inference. NO Disppay.

However, when use the TVM to compile the model for detection, its speed is about 2fps !!

The following is the compile codes:

#--------------------------------------------- Local PC --------------------------------------------------
block = model_zoo.get_model(“ssd_512_mobilenet1.0_voc”, pretrained=True)

dshape = (1, 3, 240, 320)
net, params = relay.frontend.from_mxnet(block, {“data”: dshape})

opt_level = 3
target = tvm.target.create(‘llvm -device=arm_cpu -target=aarch64-linux-gnu’)
with relay.build_config(opt_level=opt_level):
graph, lib, params = relay.build(net, target, params=params)

#----------------------------------------------- RK3399 -------------------------------------------------
slight_smile: ctx = tvm.cpu()
m = graph_runtime.create(loaded_graph, loaded_lib, ctx)
m.load_params(loaded_params)

axes = None
cap = cv2.VideoCapture(0)
cap.set(3, 320)
cap.set(4, 240)
while(True):

start = time.time()
# Capture frame-by-frame
ret, frame = cap.read()

# Image pre-processing
frame = mx.nd.array(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)).astype('uint8')
rgb_nd, frame = gcv.data.transforms.presets.ssd.transform_test(frame, short=240, max_size=480)


# Do inference
tvm_input = tvm.nd.array(rgb_nd.asnumpy(), ctx=ctx)
m.set_input('data', tvm_input)

	# execute
m.run()

	# get outputs
class_IDs, scores, bounding_boxs = m.get_output(0), m.get_output(1), m.get_output(2)

# Compute the fps
end = time.time()
seconds = end - start
fps = 1 / seconds
print("Estimated frames per second : {0}".format(fps))
	

# Display result
plt.cla()
axes = gcv.utils.viz.plot_bbox(frame, bounding_boxs.asnumpy()[0], scores.asnumpy()[0], class_IDs.asnumpy()[0], class_names=block.classes, ax=axes)
# plt.draw()
plt.pause(0.001)

zzw · May 2, 2019, 5:29am

who can help to explain this issue?

kuonangzhe · May 11, 2019, 6:27am

I met a similar problem, with only opt-level = 3 build but without auto-tune, the gluoncv ssd model on cuda is 5x slower than mxnet. Currently gluoncv should have full support in TVM, is there a benchmark or test or official speed up ratio data for share? And what might be the possible problem in our usage? Thanks a lot!! @Laurawly

Laurawly · May 13, 2019, 9:31pm

@kuonangzhe We haven’t benchmarked the performance on servers yet. Currently, we only focused on embedded GPUs. But I suggest auto-tuning the convolutions. We’ll share the benchmark once they are ready.

Laurawly · May 13, 2019, 9:55pm

@zzw The default schedule in upstream tvm hasn’t been auto-tuned for object detection workload on arm cpu. So the inference time is not optimized. I suggest you to auto tune first.

kuonangzhe · May 14, 2019, 3:54am

Many thanks for reply! The point is not about embedded GPU or server GPU. Currently GluonCV SSD has a much slower speed than mxnet on no matter intel cpu(llvm), arm cpu, or nvidia gpu(cuda), which is frustrating. Yeah I will try cuda’s auto-tuning on my side to see if it works for SSD model, if you think it is the core solution for this part. I’ll update later when I get result. Thanks a lot~ @Laurawly

zzw · May 16, 2019, 9:32am

I have tried to auto-tuning the model (SSD-MobileNet) for object detection, but there are something wrong.

The bug of auto-tuning [parameters setting]

kuonangzhe · July 10, 2019, 7:09am

@Laurawly
Hi, sorry for late reply. I used autotvm to tune the gluoncv ssd with resnet50_voc. On 1080Ti, The inference time are as follows:
mxnet: 0.03 s
tvm: 0.8 s
tvm with autotvm: 0.6 s
So it seems like there’s no obvious improvement on SSD and speed is still really slow. Are there any other suggestions?

kevinthesun · July 17, 2019, 9:09pm

There is a known issue regarding to concatenation performance on gpu causing this: Explore Optimizations for Concat

lst227405 · August 6, 2019, 5:58am

I have the same problem

Laurawly · September 9, 2019, 5:09pm

Have you tried to update the concat? One possible solution is making it opaque like this pr: https://github.com/dmlc/tvm/pull/3268.