[SOLVED] GPU mode is slower than CPU?

Lee · February 15, 2019, 7:02pm

Sorry I think I may a mistake, but it confused me.

I test the sample from_mxnet.py in tutorials by different mode ( GPU and CPU )
(https://docs.tvm.ai/tutorials/nnvm/from_mxnet.html)

I found GPU mode is slower than CPU, inconceivable

the only different code is:
CPU:
target = ‘llvm’
ctx = tvm.cpu()

GPU:
target = ‘cuda’
ctx = tvm.gpu()

any wrong?

eqy · February 13, 2019, 7:53pm

This could be possible for many reasons, especially if you are using a custom model without pretuned schedules. Can you share more details about the model you are running?

Lee · February 14, 2019, 3:46am

Sure

GPU code ######################################################################start

import mxnet as mx import nnvm import tvm import numpy as np import os

from mxnet.gluon.model_zoo.vision import get_model from mxnet.gluon.utils import download

import cv2

block = get_model(‘resnet34_v1’, pretrained=True) img_name = ‘cat.png’

synset_url = ‘’.join([‘https://gist.githubusercontent.com/zhreshold/’, ‘4d0b62f3d01426887599d4f7ede23ee5/raw/’, ‘596b27d23537e5a1b5751d2b0481ef172f58b539/’, ‘imagenet1000_clsid_to_human.txt’]) synset_name = ‘synset.txt’ download(‘https://github.com/dmlc/mxnet.js/blob/master/data/cat.png?raw=true’, img_name) download(synset_url, synset_name) with open(synset_name) as f: synset = eval(f.read())

image = cv2.imread(img_name) image = cv2.resize(image, (224, 224), interpolation=cv2.INTER_CUBIC)

def transform_image(image): image = np.array(image) - np.array([123., 117., 104.]) image /= np.array([58.395, 57.12, 57.375]) image = image.transpose((2, 0, 1)) image = image[np.newaxis, :] return image

x = transform_image(image) print(‘x’, x.shape)

os.environ[“TVM_NUM_THREADS”] = str(32)

sym, params = nnvm.frontend.from_mxnet(block) sym = nnvm.sym.softmax(sym)

import nnvm.compiler

target = 'cuda’

shape_dict = {‘data’: x.shape} with nnvm.compiler.build_config(opt_level=3): graph, lib, params = nnvm.compiler.build(sym, target, shape_dict, params=params)

from tvm.contrib import graph_runtime

ctx = tvm.gpu(0)

dtype = ‘float32’ m = graph_runtime.create(graph, lib, ctx)

n = 100 counter = 1 sum_cout = 0

while counter <= n: counter += 1 start = cv2.getTickCount() m.set_input(‘data’, tvm.nd.array(x.astype(dtype))) m.set_input(**params) # execute m.run() # get outputs tvm_output = m.get_output(0) top1 = np.argmax(tvm_output.asnumpy()[0]) print(‘TVM prediction top-1:’, top1, synset[top1]) end = cv2.getTickCount() during1 = (end - start) * 1000 / cv2.getTickFrequency() print(during1) sum_cout += during1

###################################################################### end

execute result: (‘TVM prediction top-1:’, 285, ‘Egyptian cat’) 344.089146 (‘TVM prediction top-1:’, 285, ‘Egyptian cat’) 307.417056 (‘TVM prediction top-1:’, 285, ‘Egyptian cat’) 306.406346 (‘TVM prediction top-1:’, 285, ‘Egyptian cat’) 306.430252 (‘TVM prediction top-1:’, 285, ‘Egyptian cat’) 306.464833 (‘TVM prediction top-1:’, 285, ‘Egyptian cat’) 312.29102

CPU code ######################################################################start

import mxnet as mx import nnvm import tvm import numpy as np import os

from mxnet.gluon.model_zoo.vision import get_model from mxnet.gluon.utils import download

import cv2

block = get_model(‘resnet34_v1’, pretrained=True) img_name = ‘cat.png’

synset_url = ‘’.join([‘https://gist.githubusercontent.com/zhreshold/’, ‘4d0b62f3d01426887599d4f7ede23ee5/raw/’, ‘596b27d23537e5a1b5751d2b0481ef172f58b539/’, ‘imagenet1000_clsid_to_human.txt’]) synset_name = ‘synset.txt’ download(‘https://github.com/dmlc/mxnet.js/blob/master/data/cat.png?raw=true’, img_name) download(synset_url, synset_name) with open(synset_name) as f: synset = eval(f.read())

image = cv2.imread(img_name) image = cv2.resize(image, (224, 224), interpolation=cv2.INTER_CUBIC)

def transform_image(image): image = np.array(image) - np.array([123., 117., 104.]) image /= np.array([58.395, 57.12, 57.375]) image = image.transpose((2, 0, 1)) image = image[np.newaxis, :] return image

x = transform_image(image) print(‘x’, x.shape)

os.environ[“TVM_NUM_THREADS”] = str(32) sym, params = nnvm.frontend.from_mxnet(block) sym = nnvm.sym.softmax(sym)

import nnvm.compiler target = 'llvm’

shape_dict = {‘data’: x.shape} with nnvm.compiler.build_config(opt_level=3): graph, lib, params = nnvm.compiler.build(sym, target, shape_dict, params=params)

from tvm.contrib import graph_runtime ctx = tvm.cpu()

dtype = ‘float32’ m = graph_runtime.create(graph, lib, ctx)

n = 100 counter = 1 sum_cout = 0

while counter <= n: counter += 1 start = cv2.getTickCount() m.set_input(‘data’, tvm.nd.array(x.astype(dtype))) m.set_input(**params) # execute m.run() # get outputs tvm_output = m.get_output(0) top1 = np.argmax(tvm_output.asnumpy()[0]) print(‘TVM prediction top-1:’, top1, synset[top1]) end = cv2.getTickCount() during1 = (end - start) * 1000 / cv2.getTickFrequency() print(during1) sum_cout += during1

######################################################################end

execute result: (‘TVM prediction top-1:’, 285, ‘Egyptian cat’) 59.733561 (‘TVM prediction top-1:’, 285, ‘Egyptian cat’) 47.446277 (‘TVM prediction top-1:’, 285, ‘Egyptian cat’) 47.153657 (‘TVM prediction top-1:’, 285, ‘Egyptian cat’) 47.121456 (‘TVM prediction top-1:’, 285, ‘Egyptian cat’) 47.088481 (‘TVM prediction top-1:’, 285, ‘Egyptian cat’) 62.241407 (‘TVM prediction top-1:’, 285, ‘Egyptian cat’) 46.965461

eqy · February 14, 2019, 8:06pm

Thanks for sharing an example. Do you see any warnings saying that a fallback configuration is used, and what model of NVIDIA GPU are you using?

Lee · February 15, 2019, 5:14am

yes, some warning is output in CPU mode like:

(‘x’, (1, 3, 224, 224))
WARNING:root:Failed to download tophub package for llvm: <urlopen error [Errno -3] Temporary failure in name resolution>
WARNING:autotvm:Cannot find config for target=llvm, workload=(‘conv2d’, (1, 3, 224, 224, ‘float32’), (64, 3, 7, 7, ‘float32’), (2, 2), (3, 3), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=(‘conv2d’, (1, 64, 56, 56, ‘float32’), (64, 64, 3, 3, ‘float32’), (1, 1), (1, 1), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=(‘conv2d’, (1, 64, 56, 56, ‘float32’), (128, 64, 1, 1, ‘float32’), (2, 2), (0, 0), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=(‘conv2d’, (1, 64, 56, 56, ‘float32’), (128, 64, 3, 3, ‘float32’), (2, 2), (1, 1), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=(‘conv2d’, (1, 128, 28, 28, ‘float32’), (128, 128, 3, 3, ‘float32’), (1, 1), (1, 1), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=(‘conv2d’, (1, 128, 28, 28, ‘float32’), (256, 128, 1, 1, ‘float32’), (2, 2), (0, 0), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=(‘conv2d’, (1, 128, 28, 28, ‘float32’), (256, 128, 3, 3, ‘float32’), (2, 2), (1, 1), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=(‘conv2d’, (1, 256, 14, 14, ‘float32’), (256, 256, 3, 3, ‘float32’), (1, 1), (1, 1), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=(‘conv2d’, (1, 256, 14, 14, ‘float32’), (512, 256, 1, 1, ‘float32’), (2, 2), (0, 0), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=(‘conv2d’, (1, 256, 14, 14, ‘float32’), (512, 256, 3, 3, ‘float32’), (2, 2), (1, 1), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=(‘conv2d’, (1, 512, 7, 7, ‘float32’), (512, 512, 3, 3, ‘float32’), (1, 1), (1, 1), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=llvm, workload=(‘dense’, (1, 512, ‘float32’), (1000, 512, ‘float32’), (1000, ‘float32’)). A fallback configuration is used, which may bring great performance regression.
(‘TVM prediction top-1:’, 207, ‘golden retriever’)
79.660487
(‘TVM prediction top-1:’, 285, ‘Egyptian cat’)
49.901446
(‘TVM prediction top-1:’, 285, ‘Egyptian cat’)
47.935851
(‘TVM prediction top-1:’, 285, ‘Egyptian cat’)
86.267957

and in GPU mode, the warning like:
(‘x’, (1, 3, 224, 224))
WARNING:root:Failed to download tophub package for cuda: <urlopen error [Errno -3] Temporary failure in name resolution>
WARNING:autotvm:Cannot find config for target=cuda, workload=(‘conv2d’, (1, 3, 224, 224, ‘float32’), (64, 3, 7, 7, ‘float32’), (2, 2), (3, 3), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=(‘conv2d’, (1, 64, 56, 56, ‘float32’), (64, 64, 3, 3, ‘float32’), (1, 1), (1, 1), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=(‘conv2d’, (1, 64, 56, 56, ‘float32’), (128, 64, 1, 1, ‘float32’), (2, 2), (0, 0), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=(‘conv2d’, (1, 64, 56, 56, ‘float32’), (128, 64, 3, 3, ‘float32’), (2, 2), (1, 1), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=(‘conv2d’, (1, 128, 28, 28, ‘float32’), (128, 128, 3, 3, ‘float32’), (1, 1), (1, 1), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=(‘conv2d’, (1, 128, 28, 28, ‘float32’), (256, 128, 1, 1, ‘float32’), (2, 2), (0, 0), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=(‘conv2d’, (1, 128, 28, 28, ‘float32’), (256, 128, 3, 3, ‘float32’), (2, 2), (1, 1), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=(‘conv2d’, (1, 256, 14, 14, ‘float32’), (256, 256, 3, 3, ‘float32’), (1, 1), (1, 1), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=(‘conv2d’, (1, 256, 14, 14, ‘float32’), (512, 256, 1, 1, ‘float32’), (2, 2), (0, 0), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=(‘conv2d’, (1, 256, 14, 14, ‘float32’), (512, 256, 3, 3, ‘float32’), (2, 2), (1, 1), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=(‘conv2d’, (1, 512, 7, 7, ‘float32’), (512, 512, 3, 3, ‘float32’), (1, 1), (1, 1), (1, 1), ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression.
(‘TVM prediction top-1:’, 285, ‘Egyptian cat’)
313.300898
(‘TVM prediction top-1:’, 285, ‘Egyptian cat’)
306.277236
(‘TVM prediction top-1:’, 285, ‘Egyptian cat’)
312.670918
(‘TVM prediction top-1:’, 285, ‘Egyptian cat’)
305.800381

my machine is GeForce GTX 1080Ti , cuda9.0, NVIDIA-SMI 396.37

thx for your response.

FrozenGene · February 15, 2019, 7:28am

I think it is right. If you auto tune GPU, you will get speed up and better than CPU. I have similar experience before.

Lee · February 15, 2019, 7:38am

thanks, and how can I “auto tune GPU” ?

Lee · February 15, 2019, 2:41pm

Thx all !!!

I found my answer

yvn · February 21, 2019, 12:21pm

Hi @Lee , how did you get rid of those warnings?

eqy · February 21, 2019, 7:19pm

see https://docs.tvm.ai/tutorials/autotvm/tune_relay_mobile_gpu.html and https://docs.tvm.ai/tutorials/autotvm/tune_relay_arm.html