Scaling design for NN (Nearest Neighbour) and BILINEAR

With ref to resize bilinear operation discussion @ https://github.com/dmlc/nnvm/pull/472

Existing Support:
upsampling with args layout (NCHW or NHWC), scale (integer)

To be considered with Bilinear:
1: mode (new arg): To choose NN or BILINEAR
2: scale (modify): Make it a tuple to support asymetric scaling.

To Discuss:
1: Do we rename upsampling -> scale (ideally it enlarge or squeeze the input) ?
2: Scale factor or output resolution (Just thinking scale factor would be too fractional for minor change in output intended resolution)?

Analysis:
Convolution approach for bilinear scaling (Ref: https://github.com/dmlc/tvm/pull/772).

3: Another observation is we don’t have transposed convolution for NHWC layout !

Do you mean Conv2DTransposeParam? We have one parameter named “layout”, which can be set “NHWC”. Default is “NCHW”.

and

There is not support for NHWC here. In a worse case I think we could overcome by transpose of input and output and using NCHW.

If so, we can do it like Conv2D. Many logic of conv2d_transpose_nhwc should be the same as conv2d_transpose_nchw.

My analysis with using convolution approach has some challenges as explained below

Ref: http://tech-algorithm.com/articles/bilinear-image-scaling/

Asymetric scale : Means scaling from 100 to 210, the scale factor here is 2.1.
According to algorithm the stride and window values are dynamic here.

Solution I could think of:::
Further going into details of implementation the catch lies with x, y, x_diff, y_diff calculation for each pixel in output.

This can be computed while nnvm build process (from input and output shapes) and can be added to params list.

At run time we just substitute in Y = A(1-w)(1-h) + B(w)(1-h) + C(h)(1-w) + Dwh .

For a scale to 299x299x3 image it adds nearly 4MB (4 x 4 x 299 x 299 x 3) params size increase.

Or 3 MB if we accommodate (x,y) in 4 byte.

Finally we could have below approach for bilinear_resize

TVM:

A helper function to build weights tensor from input and output shapes.
Operator to receive input image, weights to perform scale.

NNVM:

Just symbol interface to make it available on front end.
Weights generation can be done with the same tvm helper function in the front end.

Operator Args:
inputs : data or data & weights
layout : (NHWC, …etc.)
mode : NN & BILINEAR
out_size : (width, height)

@tqchen what do you advice ?

Asymetric scale : Means scaling from 100 to 210, the scale factor here is 2.1.
According to algorithm the stride and window values are dynamic here.
Solution I could think of:::
Further going into details of implementation the catch lies with x, y, x_diff, y_diff calculation for each pixel in output.

This can be computed while nnvm build process (from input and output shapes) and can be added to params list.

At run time we just substitute in Y = A(1-w)(1-h) + B(w)(1-h) + C(h)(1-w) + Dwh .

So you mean that we can support scale factor is not integer, right?

Yes, non integer scaling factor.

Yes, non integer scaling factor.

It is good. Because I know some implementation only support integer scale. Could you explain algorithm more detail ? If you can take an example, it is nicer. I am very interesting.

Core logic to compute target pixel require.

In case upscaling from 100x100 to 250x250.

In each dimension we need to generate 150 extra pixels which fall between the 100 source pixels.
Some times there would be 1 between two and some times there would be 2 (hence window is not same across for asymmetric scale).

Bilinear approach tries to take 4 pixels around from input image to derive a new pixel in target with certain weight from each pixel.

1: Every pixel on scaled result require 4 pixels from input image.
2: The difference of target from the above 4 pixels on a scale of 1 are weights.
Only w, h are enough as (1-w), (1-h) will give weight from other pixels.

The below calculation for each target pixel from 4 pixels in input.
Y = A(1-w)(1-h) + B(w)(1-h) + C(h)(1-w) + Dwh .

Our approach to TVM:
As the input shape for a graph will be fixed we could have the “source pixels indexes” and “weights” for each pixel precomputed and stored in params.

Hence on target it’s just Y calculation from input & weights.

I am working on a sample code on TVM, will share you soon.

Looking forward to it

Maybe you could ref this implementation: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/kernels/internal/reference/reference_ops.h#L2959

I have used Python to implement it and it should be easy. TF’s computation seems that it has one special attribute: align_corners. It will effect the result.

Tensorflow use the same algo. Align_corners is small modification which I will consider.

Ok. However, I am also trying to port my implementation into TVM topi. I am not very familiar with tvm compute mechnisum until now. So I want to ask for your advice. When I move my numpy implementation to tvm topi, how do I move these for loops into tvm.compute mechnisum? Let me show you the code example:

for b in range(batches):
    for y in range(output_height):
        for x in range(output_width):
            for c in range(depth):
                input_y = y * height_scale
                y0 = int(math.floor(input_y))
                y1 = min(y0 + 1, input_height - 1)
                input_x = x * width_scale
                x0 = int(math.floor(input_x))
                x1 = min(x0 + 1, input_width - 1)
                interpolation = input_data[b, y0, x0, c] * (1 - (input_y - y0)) * (1 - (input_x - x0)) + \
                                input_data[b, y1, x0, c] * (input_y - y0) * (1 - (input_x - x0)) + \
                                input_data[b, y0, x1, c] * (1 - (input_y - y0)) * (input_x - x0) + \
                                input_data[b, y1, x1, c] * (input_y - y0) * (input_x - x0)
                output_data[b, y, x, c] = interpolation

The compute result is stored into output_data. I haven’t find one tutorial or example about it.

Some sample python working sample.

  • scale down need a little change in logic - hence not clear.

In a nutshell :

Compute takes out shape, lambda function. Compute use the shape to generate iterators and calls the lambda to return compute logic based on the variables.

Later the lowering generates the ir based on the Iterators and computational logic.

You can refer and play around some existing sample code (may be up_sampling which may be closer to this).

Yes, I have read up_sampling example. However, as your github’s example, we have complex logic. Up_sampling example just pass the scale to H / W. Currently I have not understand how to port it into tvm.compute.(as you said, we have output_shape, one lambda. Then how to we combine our logic via its iterator variables?)