Potential Issue when Cloning Nodes

I was using an incorrect way to clone a Call node and it results in a weird behavior. Considering the following case:

from copy import deepcopy
import tvm
from tvm import relay

inputs = relay.var('data', shape=(1, 3, 224, 224), dtype='float32')
weights1 = relay.var('weights1', shape=(32, 3, 3, 3), dtype='float32')
weights2 = relay.var('weights2', shape=(32, 32, 1, 1), dtype='float32')

out1 = relay.nn.conv2d(inputs,
                       weights1,
                       kernel_size=(3, 3),
                       padding=(1, 1),
                       strides=(2, 2))
out2 = relay.nn.conv2d(out1,
                       weights2,
                       kernel_size=(1, 1),
                       padding=(1, 1))

print('=== Original Node ===')
print(out2)

new_inputs = relay.var('new_data', shape=(1, 3, 224, 224), dtype='float32')
new_weights = relay.var('new_weights', shape=(32, 32, 1, 1), dtype='float32')
new_args = [new_inputs, new_weights]


copy_out2 = deepcopy(out2)
copy_out2.args = new_args

print('=== Copied Node Args ===')
print(copy_out2.args)

print('=== Copied Node ===')
print(copy_out2)

new_out2 = relay.expr.Call(out2.op, new_args, out2.attrs, out2.type_args)
print('=== New Node ===')
print(new_out2)

I used deepcopy to clone a node (which is not supposed to be used, I know) and updated the argument list of the copied node with new Relay variables. I was hoping to isolate this node for further analysis. However, the output of the above code is:

=== Original Node ===
v0.0.4
free_var %data: Tensor[(1, 3, 224, 224), float32]
free_var %weights1: Tensor[(32, 3, 3, 3), float32]
%0 = nn.conv2d(%data, %weights1, strides=[2, 2], padding=[1, 1], kernel_size=[3, 3]);
free_var %weights2: Tensor[(32, 32, 1, 1), float32]
nn.conv2d(%0, %weights2, padding=[1, 1], kernel_size=[1, 1])
=== Copied Node Args ===
[Var(new_data, ty=TensorType([1, 3, 224, 224], float32)), Var(new_weights, ty=TensorType([32, 32, 1, 1], float32))]
=== Copied Node ===
v0.0.4
free_var %data: Tensor[(1, 3, 224, 224), float32]
free_var %weights1: Tensor[(32, 3, 3, 3), float32]
%0 = nn.conv2d(%data, %weights1, strides=[2, 2], padding=[1, 1], kernel_size=[3, 3]);
free_var %weights2: Tensor[(32, 32, 1, 1), float32]
nn.conv2d(%0, %weights2, padding=[1, 1], kernel_size=[1, 1])
=== New Node ===
v0.0.4
free_var %new_data: Tensor[(1, 3, 224, 224), float32]
free_var %new_weights: Tensor[(32, 32, 1, 1), float32]
nn.conv2d(%new_data, %new_weights, padding=[1, 1], kernel_size=[1, 1])

As can be seen, although the args has been updated, the underlying node argument remains the same unless we reconstruct the node from scratch. Maybe we need a checker or something to let users know mutating a node (even it was cloned) is forbidden?

cc @junrushao @haichen

1 Like

I think you need to create a relay.Function and call with new_args.

Here is a simple test with Relay function:

copy_out2 = deepcopy(out2)
copy_out2.args = new_args
f = relay.Function(new_args, copy_out2)
print(f)

But it seems not working either:

v0.0.4
fn (%new_data: Tensor[(1, 3, 224, 224), float32], %new_weights: Tensor[(32, 32, 1, 1), float32]) {
  free_var %data: Tensor[(1, 3, 224, 224), float32]
  free_var %weights1: Tensor[(32, 3, 3, 3), float32]
  %0 = nn.conv2d(%data, %weights1, strides=[2, 2], padding=[1, 1], kernel_size=[3, 3]);
  free_var %weights2: Tensor[(32, 32, 1, 1), float32]
  nn.conv2d(%0, %weights2, padding=[1, 1], kernel_size=[1, 1])
}

try:

f = relay.Function(free_vars(out2), out2)
out = relay.Call(f, ...)

Yeah this could solve the problem and it’s same as the reference solution I provided. As long as we create a new node by constructing instead of copying we are good.

As Cody mentioned, given the fact that it has been very confusing and misleading, I do think we should either

  1. disable __setattr__ so that we don’t have such behavior
  1. overload __setattr__ to make the behavior correct

CC: @tqchen

What do you think?

In most cases we want the node object to be immutable and deep copy was not necessary, so having a warning in set attr makes sense

I realized that this cannot be done with a few line of code change. If we add a warning to __setattr__ at NodeBase, all Nodes that assign values to attributes in their constructors will trigger this warning as well. It means we will see lots of warnings popping out in normal use cases.

On the other hand, one reasonable solution that this article suggested requires all legel attribute settings to explicitly call __setattr__ from the super class, which implies that we have to modify the constructor of all Nodes accordingly and it seems too tedious and not necessary at this moment.

Hmmm yep…Let’s keep this in mind then