This is imported from https://github.com/dmlc/tvm/issues/2924
I am recently trying to test GluonCV models with TVM deployment, and I met an unexpected performance issue.
Specifically I try to run a CV model with input from frames in a video. I use OpenCV to read each frame.
OpenCV is fast
Here’s a piece of code that reads each frame and switch the Blue and Red channels:
import time
import cv2
cap = cv2.VideoCapture('demo_video.mp4')
for i in range(30):
tic = time.time()
ret, frame = cap.read()
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
time_diff = time.time() - tic
if i > 10:
print(int(time_diff*1000))
cap.release()
The output should be around 1~3 ms per cycle. This is fast as expected.
TVM + OpenCV is slow
Now, lets add some TVM code into it.
import time
import cv2
import tvm
from tvm.relay.testing.config import ctx_list
from tvm import relay
from tvm.contrib import graph_runtime
net, params = relay.testing.resnet.get_workload(
num_layers=18, batch_size=1, image_shape=(3, 224, 224))
with relay.build_config(opt_level=3):
graph, lib, params = relay.build(net, 'llvm', params=params)
cap = cv2.VideoCapture('demo_video.mp4')
for i in range(20):
tic = time.time()
ret, frame = cap.read()
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
time_diff = time.time() - tic
if i > 10:
print(int(time_diff*1000))
cap.release()
Note: The compiled TVM model is not even executed.
The result is 51 ms per cycle, which is extremely slower than the first case.
TVM + Numpy is fast
I replace the cvtColor
call by the equivalent numpy implementation:
import time
import cv2
import tvm
from tvm.relay.testing.config import ctx_list
from tvm import relay
from tvm.contrib import graph_runtime
net, params = relay.testing.resnet.get_workload(
num_layers=18, batch_size=1, image_shape=(3, 224, 224))
with relay.build_config(opt_level=3):
graph, lib, params = relay.build(net, 'llvm', params=params)
cap = cv2.VideoCapture('demo_video.mp4')
for i in range(20):
tic = time.time()
ret, frame = cap.read()
frame[:,:,(2,1,0)] = frame[:,:,(0,1,2)]
time_diff = time.time() - tic
if i > 10:
print(int(time_diff*1000))
cap.release()
And the speed resumes to be 3~4 ms per cycle. Here speed being 4 ms other than 2 ms is because that cvtColor
is expected to be more efficient than the numpy indexing. However TVM has an implicit negative effect on the performance thus “boosts” cvtColor
up to 50 ms.
Environments:
OS: Ubuntu 16.04
cv2: pip install opencv-python
(4.0.0.21)
tvm: master at dfe4c466
hardware: AWS C5.18x Instance
data: I believe you can reproduce it with an arbitrary input video.