[RELAY] Buffer reuse in strided slice


#1

Currently we need to use strided_slice to split the output of CombineParallelConv2D. If the batch size = 1, or the C is in the leading dimension, the copy can be eliminated. We can reuse the original buffer and calculate offsets of each slice.

cc @tqchen @masahi


#2

That is true, however, we can skip this for now as it is not necessarily the bottleneck and might require a bit more thoughts in planning.