Hello!
Currently, I am testing to compare the performance of direct conv2d and winograd conv2d using TOPI. However, as a result of experiments, conv2d using winograd algorithm is too much worse than direct. The code below is the code I experimented with.
## data shape
data_shape = (1,3,224,224)
w_shape = (64,3,3,3)
## Data
sample_data = np.random.uniform(-1,1, size=data_shape ).astype("float32")
sample_p1 = np.random.uniform(-1,1, size=w_shape ).astype("float32")
## placeholder
input_data = tvm.te.placeholder( shape = data_shape, dtype = "float32", name="Input" )
p1 = tvm.te.placeholder( shape = w_shape, dtype="float32", name="p1" )
## Winograd conv2d
with tvm.target.create('cuda'):
conv = topi.cuda.conv2d_nchw_winograd(input_data
,p1
,(1,1)
,(0,0)
,(1,1)
,"float32" )
sch = topi.cuda.schedule_conv2d_nchw_winograd([conv])
winoMod = tvm.build( sch, [ input_data,p1,conv] , target, name='wino')
## Direct conv2d
with tvm.target.create('cuda'):
conv = topi.cuda.conv2d_nchw( input_data
,p1
,[1,1]
,[0,0]
,[1,1] )
sch = topi.cuda.schedule_conv2d_nchw([conv])
simpleMod = tvm.build(sch, [input_data,p1], target, name='direct' )
## Real data
tvm_input = tvm.nd.array( sample_data , ctx )
tvm_p1 = tvm.nd.array( sample_p1, ctx )
## Performance Testing
ev_wino = winoMod.time_evaluator(winoMod.entry_name, ctx, number=1,repeat=100 )
ev_conv = simpleMod.time_evaluator(simpleMod.entry_name, ctx, number=1,repeat=100 )
timer = ev_conv( tvm_input, tvm_p1).mean*1e3
print("Conv with Direct algo -> ",timer)
timer = ev_wino( tvm_input, tvm_p1).mean*1e3
print("Conv with Winograd Strassen algo -> ",timer )
The execution result is as follows.
Conv with Direct algo -> 0.11522044
Conv with Winograd Strassen algo -> 4.70840109
The performance gap is too big. According to the Fast Algorithms for Convolutional Neural Networks paper, I think performance is higher or similar than direct conv2d. Is there something I misunderstood?