Scaling design for NN (Nearest Neighbour) and BILINEAR

srkreddy1238 · May 21, 2018, 1:57pm

With ref to resize bilinear operation discussion @ https://github.com/dmlc/nnvm/pull/472

Existing Support:
upsampling with args layout (NCHW or NHWC), scale (integer)

To be considered with Bilinear:
1: mode (new arg): To choose NN or BILINEAR
2: scale (modify): Make it a tuple to support asymetric scaling.

To Discuss:
1: Do we rename upsampling -> scale (ideally it enlarge or squeeze the input) ?
2: Scale factor or output resolution (Just thinking scale factor would be too fractional for minor change in output intended resolution)?

Analysis:
Convolution approach for bilinear scaling (Ref: https://github.com/dmlc/tvm/pull/772).

srkreddy1238 · May 21, 2018, 5:09pm

3: Another observation is we don’t have transposed convolution for NHWC layout !

FrozenGene · May 22, 2018, 1:34am

Do you mean Conv2DTransposeParam? We have one parameter named “layout”, which can be set “NHWC”. Default is “NCHW”.

srkreddy1238 · May 22, 2018, 5:05am

github.com

dmlc/nnvm/blob/784b336b7ac4c1cb1b61b90770f3998631b1ef87/python/nnvm/top/nn.py#L175




# conv2d_transpose
@reg.register_compute("conv2d_transpose")
def compute_conv2d_transpose(attrs, inputs, _):
"""Compute definition of conv2d_transpose"""
padding = attrs.get_int_tuple("padding")
strides = attrs.get_int_tuple("strides")
dilation = attrs.get_int_tuple("dilation")
groups = attrs.get_int("groups")
layout = attrs["layout"]
assert layout == "NCHW", "only support nchw for now"
assert dilation == (1, 1), "not support dilate now"
assert groups == 1, "only support groups == 1 for now"
out = topi.nn.conv2d_transpose_nchw(inputs[0], inputs[1], strides, padding)
if attrs.get_bool("use_bias"):
    bias = inputs[2]
    bias = topi.expand_dims(bias, axis=1, num_newaxis=2)
    out = topi.broadcast_add(out, bias)
output_padding = attrs.get_int_tuple("output_padding")
out = topi.nn.pad(out, \
    [0, 0, 0, 0], [0, 0, output_padding[0], output_padding[1]])

and

github.com

dmlc/tvm/blob/master/topi/python/topi/nn/conv2d_transpose.py

# pylint: disable=invalid-name, unused-variable
"""Transposed 2D convolution operators (sometimes called Deconvolution)."""
from __future__ import absolute_import as _abs
import tvm

from .dilate import dilate
from .pad import pad
from .util import get_pad_tuple
from ..util import simplify


@tvm.target.generic_func
def conv2d_transpose_nchw(Input, Filter, strides, padding):
    """Transposed 2D convolution nchw forward operator.

    Parameters
    ----------
    Input : tvm.Tensor
        4-D with shape [batch, in_channel, in_height, in_width]

This file has been truncated. show original

There is not support for NHWC here. In a worse case I think we could overcome by transpose of input and output and using NCHW.

FrozenGene · May 22, 2018, 5:20am

If so, we can do it like Conv2D. Many logic of conv2d_transpose_nhwc should be the same as conv2d_transpose_nchw.

srkreddy1238 · May 22, 2018, 5:24am

My analysis with using convolution approach has some challenges as explained below

Ref: http://tech-algorithm.com/articles/bilinear-image-scaling/

Asymetric scale : Means scaling from 100 to 210, the scale factor here is 2.1.
According to algorithm the stride and window values are dynamic here.

Solution I could think of:::
Further going into details of implementation the catch lies with x, y, x_diff, y_diff calculation for each pixel in output.

This can be computed while nnvm build process (from input and output shapes) and can be added to params list.

At run time we just substitute in Y = A(1-w)(1-h) + B(w)(1-h) + C(h)(1-w) + Dwh .

srkreddy1238 · May 22, 2018, 5:27am

For a scale to 299x299x3 image it adds nearly 4MB (4 x 4 x 299 x 299 x 3) params size increase.

srkreddy1238 · May 22, 2018, 5:29am

Or 3 MB if we accommodate (x,y) in 4 byte.

srkreddy1238 · May 22, 2018, 11:35am

Finally we could have below approach for bilinear_resize

TVM:

A helper function to build weights tensor from input and output shapes.
Operator to receive input image, weights to perform scale.

NNVM:

Just symbol interface to make it available on front end.
Weights generation can be done with the same tvm helper function in the front end.

Operator Args:
inputs : data or data & weights
layout : (NHWC, …etc.)
mode : NN & BILINEAR
out_size : (width, height)

@tqchen what do you advice ?

FrozenGene · May 24, 2018, 3:27am

Asymetric scale : Means scaling from 100 to 210, the scale factor here is 2.1.
According to algorithm the stride and window values are dynamic here.
Solution I could think of:::
Further going into details of implementation the catch lies with x, y, x_diff, y_diff calculation for each pixel in output.

This can be computed while nnvm build process (from input and output shapes) and can be added to params list.

At run time we just substitute in Y = A(1-w)(1-h) + B(w)(1-h) + C(h)(1-w) + Dwh .

So you mean that we can support scale factor is not integer, right?

srkreddy1238 · May 24, 2018, 4:36am

Yes, non integer scaling factor.

FrozenGene · May 24, 2018, 5:55am

Yes, non integer scaling factor.

It is good. Because I know some implementation only support integer scale. Could you explain algorithm more detail ? If you can take an example, it is nicer. I am very interesting.

srkreddy1238 · May 24, 2018, 6:50am

Core logic to compute target pixel require.

In case upscaling from 100x100 to 250x250.

In each dimension we need to generate 150 extra pixels which fall between the 100 source pixels.
Some times there would be 1 between two and some times there would be 2 (hence window is not same across for asymmetric scale).

Bilinear approach tries to take 4 pixels around from input image to derive a new pixel in target with certain weight from each pixel.

1: Every pixel on scaled result require 4 pixels from input image.
2: The difference of target from the above 4 pixels on a scale of 1 are weights.
Only w, h are enough as (1-w), (1-h) will give weight from other pixels.

The below calculation for each target pixel from 4 pixels in input.
Y = A(1-w)(1-h) + B(w)(1-h) + C(h)(1-w) + Dwh .

Our approach to TVM:
As the input shape for a graph will be fixed we could have the “source pixels indexes” and “weights” for each pixel precomputed and stored in params.

Hence on target it’s just Y calculation from input & weights.

srkreddy1238 · May 24, 2018, 6:51am

I am working on a sample code on TVM, will share you soon.

FrozenGene · May 24, 2018, 7:01am

Looking forward to it

FrozenGene · May 24, 2018, 10:15am

Maybe you could ref this implementation: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/kernels/internal/reference/reference_ops.h#L2959

I have used Python to implement it and it should be easy. TF’s computation seems that it has one special attribute: align_corners. It will effect the result.

srkreddy1238 · May 24, 2018, 10:30am

Tensorflow use the same algo. Align_corners is small modification which I will consider.

FrozenGene · May 24, 2018, 1:06pm

Ok. However, I am also trying to port my implementation into TVM topi. I am not very familiar with tvm compute mechnisum until now. So I want to ask for your advice. When I move my numpy implementation to tvm topi, how do I move these for loops into tvm.compute mechnisum? Let me show you the code example:

for b in range(batches):
    for y in range(output_height):
        for x in range(output_width):
            for c in range(depth):
                input_y = y * height_scale
                y0 = int(math.floor(input_y))
                y1 = min(y0 + 1, input_height - 1)
                input_x = x * width_scale
                x0 = int(math.floor(input_x))
                x1 = min(x0 + 1, input_width - 1)
                interpolation = input_data[b, y0, x0, c] * (1 - (input_y - y0)) * (1 - (input_x - x0)) + \
                                input_data[b, y1, x0, c] * (input_y - y0) * (1 - (input_x - x0)) + \
                                input_data[b, y0, x1, c] * (1 - (input_y - y0)) * (input_x - x0) + \
                                input_data[b, y1, x1, c] * (input_y - y0) * (input_x - x0)
                output_data[b, y, x, c] = interpolation

The compute result is stored into output_data. I haven’t find one tutorial or example about it.

srkreddy1238 · May 24, 2018, 2:04pm

Some sample python working sample.

scale down need a little change in logic - hence not clear.

In a nutshell :

Compute takes out shape, lambda function. Compute use the shape to generate iterators and calls the lambda to return compute logic based on the variables.

Later the lowering generates the ir based on the Iterators and computational logic.

You can refer and play around some existing sample code (may be up_sampling which may be closer to this).

FrozenGene · May 24, 2018, 3:43pm

Yes, I have read up_sampling example. However, as your github’s example, we have complex logic. Up_sampling example just pass the scale to H / W. Currently I have not understand how to port it into tvm.compute.(as you said, we have output_shape, one lambda. Then how to we combine our logic via its iterator variables?)