Hello!
Currently I am trying to inference VGG-16 through arm cpu.
import tvm
import tvm.relay as relay
from tvm.contrib import graph_runtime
import numpy as np
import topi
from tvm.relay.testing.temp_op_attr import TempOpAttr
target_arm_cpu = tvm.target.create('llvm -device=arm_cpu -target=aarch64-linux-gnu')
ctx_arm_cpu = tvm.runtime.cpu()
dtype='float32'
batch_size = 1
num_class = 1000
image_shape = (3, 224, 224)
data_shape = (batch_size,) + image_shape
out_shape = (batch_size, num_class)
mod, paramsO = relay.testing.vgg.get_workload(
num_layers=16, batch_size=batch_size, image_shape=image_shape)
opt_level = 3
#arm_cpu
with relay.build_config(opt_level = opt_level):
graph, lib, params = relay.build_module.build( mod, target_arm_cpu , params = paramsO )
data = tvm.nd.array( np.random.uniform(-1, 1, size=data_shape ).astype("float32") , ctx_arm_cpu )
module = graph_runtime.create(graph, lib, ctx_arm_cpu)
module.set_input("data", data)
module.set_input(**params)
print("RUNNING")
timer = module.module.time_evaluator('run',ctx_arm_cpu,number=1,repeat=2)
prof_res = np.array( timer().results )*1000
print("arm CPU -> Mean inference time (std dev): %.2f ms (%.2f ms)" %(np.mean(prof_res), np.std(prof_res)))
When I run the above code, the result is
arm CPU -> Mean inference time (std dev): 1954.49 ms (0.57 ms)
I remembered that in the old version of tvm, when vgg16 was inference, the performance was measured at about 1000ms.
The performance seems to have decreased by about twice. Is there anything I misunderstood? Or am I implementing the code incorrectly?