Potential Issue when Cloning Nodes

comaniac · October 17, 2019, 6:52pm

I was using an incorrect way to clone a Call node and it results in a weird behavior. Considering the following case:

from copy import deepcopy
import tvm
from tvm import relay

inputs = relay.var('data', shape=(1, 3, 224, 224), dtype='float32')
weights1 = relay.var('weights1', shape=(32, 3, 3, 3), dtype='float32')
weights2 = relay.var('weights2', shape=(32, 32, 1, 1), dtype='float32')

out1 = relay.nn.conv2d(inputs,
                       weights1,
                       kernel_size=(3, 3),
                       padding=(1, 1),
                       strides=(2, 2))
out2 = relay.nn.conv2d(out1,
                       weights2,
                       kernel_size=(1, 1),
                       padding=(1, 1))

print('=== Original Node ===')
print(out2)

new_inputs = relay.var('new_data', shape=(1, 3, 224, 224), dtype='float32')
new_weights = relay.var('new_weights', shape=(32, 32, 1, 1), dtype='float32')
new_args = [new_inputs, new_weights]


copy_out2 = deepcopy(out2)
copy_out2.args = new_args

print('=== Copied Node Args ===')
print(copy_out2.args)

print('=== Copied Node ===')
print(copy_out2)

new_out2 = relay.expr.Call(out2.op, new_args, out2.attrs, out2.type_args)
print('=== New Node ===')
print(new_out2)

I used deepcopy to clone a node (which is not supposed to be used, I know) and updated the argument list of the copied node with new Relay variables. I was hoping to isolate this node for further analysis. However, the output of the above code is:

=== Original Node ===
v0.0.4
free_var %data: Tensor[(1, 3, 224, 224), float32]
free_var %weights1: Tensor[(32, 3, 3, 3), float32]
%0 = nn.conv2d(%data, %weights1, strides=[2, 2], padding=[1, 1], kernel_size=[3, 3]);
free_var %weights2: Tensor[(32, 32, 1, 1), float32]
nn.conv2d(%0, %weights2, padding=[1, 1], kernel_size=[1, 1])
=== Copied Node Args ===
[Var(new_data, ty=TensorType([1, 3, 224, 224], float32)), Var(new_weights, ty=TensorType([32, 32, 1, 1], float32))]
=== Copied Node ===
v0.0.4
free_var %data: Tensor[(1, 3, 224, 224), float32]
free_var %weights1: Tensor[(32, 3, 3, 3), float32]
%0 = nn.conv2d(%data, %weights1, strides=[2, 2], padding=[1, 1], kernel_size=[3, 3]);
free_var %weights2: Tensor[(32, 32, 1, 1), float32]
nn.conv2d(%0, %weights2, padding=[1, 1], kernel_size=[1, 1])
=== New Node ===
v0.0.4
free_var %new_data: Tensor[(1, 3, 224, 224), float32]
free_var %new_weights: Tensor[(32, 32, 1, 1), float32]
nn.conv2d(%new_data, %new_weights, padding=[1, 1], kernel_size=[1, 1])

As can be seen, although the args has been updated, the underlying node argument remains the same unless we reconstruct the node from scratch. Maybe we need a checker or something to let users know mutating a node (even it was cloned) is forbidden?

cc @junrushao @haichen

kevinthesun · October 17, 2019, 8:04pm

I think you need to create a relay.Function and call with new_args.

comaniac · October 17, 2019, 8:07pm

Here is a simple test with Relay function:

copy_out2 = deepcopy(out2)
copy_out2.args = new_args
f = relay.Function(new_args, copy_out2)
print(f)

But it seems not working either:

v0.0.4
fn (%new_data: Tensor[(1, 3, 224, 224), float32], %new_weights: Tensor[(32, 32, 1, 1), float32]) {
  free_var %data: Tensor[(1, 3, 224, 224), float32]
  free_var %weights1: Tensor[(32, 3, 3, 3), float32]
  %0 = nn.conv2d(%data, %weights1, strides=[2, 2], padding=[1, 1], kernel_size=[3, 3]);
  free_var %weights2: Tensor[(32, 32, 1, 1), float32]
  nn.conv2d(%0, %weights2, padding=[1, 1], kernel_size=[1, 1])
}

kevinthesun · October 17, 2019, 8:34pm

try:

f = relay.Function(free_vars(out2), out2)
out = relay.Call(f, ...)

comaniac · October 17, 2019, 8:46pm

Yeah this could solve the problem and it’s same as the reference solution I provided. As long as we create a new node by constructing instead of copying we are good.

junrushao · October 17, 2019, 9:39pm

As Cody mentioned, given the fact that it has been very confusing and misleading, I do think we should either

disable __setattr__ so that we don’t have such behavior

overload __setattr__ to make the behavior correct

junrushao · October 18, 2019, 5:20am

CC: @tqchen

What do you think?

tqchen · October 18, 2019, 2:48pm

In most cases we want the node object to be immutable and deep copy was not necessary, so having a warning in set attr makes sense

comaniac · October 18, 2019, 9:08pm

I realized that this cannot be done with a few line of code change. If we add a warning to __setattr__ at NodeBase, all Nodes that assign values to attributes in their constructors will trigger this warning as well. It means we will see lots of warnings popping out in normal use cases.

On the other hand, one reasonable solution that this article suggested requires all legel attribute settings to explicitly call __setattr__ from the super class, which implies that we have to modify the constructor of all Nodes accordingly and it seems too tedious and not necessary at this moment.

junrushao · October 19, 2019, 12:12am

Hmmm yep…Let’s keep this in mind then