TVM with llvm is far slow than pytorch for vgg16 inference?

I’m trying to evaluate TVM and pytorch for vgg16 inference.

Code for pytorch:

import torch
import time
from torchvision import datasets, models, transforms

model = models.vgg16()
state_dict = torch.load('vgg16-397923af.pth', map_location='cpu') # add map_location='cpu' if no gpu
print("load model")
model.load_state_dict(state_dict)

print("load over")
example = torch.rand(1, 3, 224, 224)
model.eval()
print("begin eval")

since = time.time()
for i in range(1000):
    model(example)
    if i%100 == 0:
        print("i: {} is processed.".format(i))
time_elapsed = time.time() - since
print('Time elapsed is {:.0f}m {:.0f}s'.
        format(time_elapsed // 60, time_elapsed % 60))

Code for TVM:

import onnx
import time
import tvm
import numpy as np
import tvm.relay as relay
from PIL import Image
from tvm.contrib import graph_runtime

onnx_model = onnx.load('vgg16.onnx')
x = np.random.rand(1, 3, 224, 224)
input_name = '0'
shape_dict = {input_name: x.shape}
sym, params = relay.frontend.from_onnx(onnx_model, shape_dict)

target = 'llvm'
ctx = tvm.cpu(0)
dtype = 'float32'

with relay.build_config(opt_level=3):
    graph, lib, params = relay.build_module.build(sym, target, target_host="llvm", params=params)
print("graph lib is over")

module = graph_runtime.create(graph, lib, ctx)

module.set_input(**params)
input = tvm.nd.array(x.astype(dtype), ctx)
since = time.time()
for i in range(1000):
    module.set_input('0', input)
    module.run()
    if i%100 == 0:
        print("i: {} is processed.".format(i))
time_elapsed = time.time() - since
print('Time elapsed is {:.0f}m {:.0f}s'.
      format(time_elapsed // 60, time_elapsed % 60))

and TVM get warning:

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 553435785
WARNING:root:Constant evaluating Reshape's shape argument, may reduce performance

Result: (for 1000 samples):
TVM takes 4m47 while pytorch takes only: 98s

For GPU(P40)
TVM takes 6s and pytorch takes 7s

tvm version: commit_id a5acca929f5a72f38c1ae19508a4efb7947f05bc
pytorch: 1.1.0
Can you point out the problem of the Exp?