[PyTorch] [Frontend] graph input names can change using loaded torchscript

jjohnson-arm · March 23, 2020, 10:57am

That sounds reasonable, maybe we could at least check that the inputs are valid “shape tuples”? Though this is probably a common thing across frontends that could be done.

masahi · March 23, 2020, 11:01am

yes, we can repurpose and rename _check_input_names to do necessary input validation.

For other frontends, I also remember being annoyed for having to supply input names. Unfortunately for them it is too late to fix. We shouldn’t repeat the same mistake

zhiics · March 23, 2020, 9:44pm

The input names are really annoying. I think one use case of the name to shape dict is to avoid the wrong order of the inputs. How hard is it for users to supply the inputs in the correct order? And it is possible to connect the names after _run_jin_passes?

masahi · March 23, 2020, 10:25pm

PyTorch users should know the correct order of inputs, because PyTorch modules forward(...) method expects its inputs to be in the correct order (otherwise they cannot run any training code).

Yes, it is very straightforward as long as the user supplied input shape list is correct. Something like below:

github.com

apache/incubator-tvm/blob/86079479f0556002adfce2f438ea2a607e318c23/tests/python/frontend/pytorch/test_forward.py#L844-L845


input_names = get_graph_input_names(script_module)
input_shapes = dict(zip(input_names, ishapes))

zhiics · March 23, 2020, 10:56pm

Thanks for clarification. I think this change makes sense to me.

masahi · March 24, 2020, 10:33am

@jjohnson-arm Do you want to send a PR about it? Otherwise I will, no problem

jjohnson-arm · March 25, 2020, 8:20am

Yes - I can send a PR.

jjohnson-arm · March 25, 2020, 12:09pm

Hmm… one issue we still have if we do this is that the user still needs to know the input names to set the data input for the relay model - i.e. relay_model.set_input(input_name, data)

So I presume we still need some way of sorting that out, so either we need a way of querying the relay_model for the names of the inputs - is there something already?

Or maybe it is just better to supply the post run_jit_passes input names via the original call and the original way you were doing it.

masahi · March 25, 2020, 2:11pm

Oh you are right… I also realized that in a typical AOT deploy use case, we just load compiled models directly from exported libs, so there is no torchscript or relay models. But users still need to keep input names around somehow.

I agree that an ideal solution is for compiled runtime modules to enable querying a list of input names in a correct order, but right now there is no way to do that. There is GraphRuntime::GetInputIndex(...) (used in set_input), but we need an “inverse” of this function.

github.com

apache/incubator-tvm/blob/41e1d5f911493c62cf3ae39fe1420ed0ae17d62c/src/runtime/graph/graph_runtime.cc#L88-L95


int GraphRuntime::GetInputIndex(const std::string& name) {
  auto it = input_map_.find(name);
  if (it != input_map_.end()) {
    return it->second;
  }
  LOG(WARNING) << "Warning: cannot find \"" << name << "\" among input";
  return -1;
}

A non-runtime invasive solution is to ask users to give us a list of (input_name, input_shape), and we override the Torch input IR names with names provided by users. Users can just choose arbitrary names (“input0”, “input1”, etc.).

I think this is better than returning whatever names Torch chooses from our frontend and ask users to somehow keep these names around until deployment.

jjohnson-arm · March 25, 2020, 5:03pm

Ok. So just to see if I understand, you are proposing:

User supplies something like: [('input0', [1,2,3]), ('input1', [4,5])]
from_pytorch() changes the relay_graph to use these names on conversion
User then uses the same names when using compiled models.

Is that right?

jjohnson-arm · March 25, 2020, 5:12pm

I have some code working as above, but I am using an input conversion map (created after reading the input_shapes) in _get_op_inputs() to convert the op inputs to the user supplied names.

I am wondering if it would be possible just to append some conversion entries to the output_map_index instead and it would achieve the same thing?

masahi · March 25, 2020, 9:34pm

Yes, exactly right.

I’m not completely sure what you mean here, but since _get_op_inputs looks for the original Torch IR input names, we need to overwrite the input names or add additional entries to outputs and output_index_map.

Overwrite can be done by setDebugName method. For the latter solution, we can add

    for torch_input_name, relay_var in zip(get_graph_input_names(script_module),
                                           input_vars.values()):
        output_index_map[torch_input_name] = len(outputs)
        outputs.append(relay_var)

after

github.com

apache/incubator-tvm/blob/3aabbd9c30d247a31eb19ebf997d6074b14b5dd9/python/tvm/relay/frontend/pytorch.py#L1398-L1399


outputs = list(input_vars.values())
output_index_map = dict(zip(input_vars.keys(), range(len(outputs))))

masahi · March 25, 2020, 9:41pm

I realized that output_index_map is completely redundant if we make outputs a dict instead of a list. Because outputs is always accessed via output_index_map like this, (here, outputs is a list)

outputs[output_index_map[var_name]]

instead it should be just outputs[var_name].

@jjohnson-arm Does this make sense? If yes, feel free to remove output_index_map and make outputs a dict from node name to relay output values.

jjohnson-arm · March 26, 2020, 9:34am

Thanks for the comments - they have helped to shed light on things - !

I agree with the removal of the output_map_index, though my suggestion was to actually use this as a redirect to the same entries for the user supplied names. I.e. It would have entries for the user specified names (from _get_relay_input_vars) and then you add some re-directs to these same outputs list entries for the pytorch names.

But as you say, we could just add some extra output entries if we turn outputs into a dictionary, or just use the setDebugName to change the graph - I will have a look into both.

FYI, my initial method (just as a trial):

Read in user input_shapes and create simple conversion map from pytorch to user `{ ‘pytorch input.1’: ‘user input 1’ }
_get_relay_input_vars is still used to construct outputs from the user input_shapes, and gets added to outputs - so the user names are already there
in _get_op_inputs I use the new conversion map to convert the pytorch names (from _get_input_names) into the user names before looking them up in the output_map_index - heres where I could have just used the output_map_index instead of an extra map - I was being a bit overly cautious.

jjohnson-arm · April 1, 2020, 3:07pm

Posted PR - https://github.com/apache/incubator-tvm/pull/5204

t-vi · June 12, 2020, 8:09pm

Just to warm this up a bit. While graph input debug names can change, PyTorch does keep the stem stable. This is used e.g. for script_module.code and to give an error for missing inputs (try script_module()).

github.com

pytorch/pytorch/blob/master/torch/csrc/jit/ir/ir.cpp#L735


}

bool Value::mustBeNone() const {
  return type()->cast<NoneType>() || node_->mustBeNone();
}
bool Value::mustNotBeNone() const {
  return node_->kind() != prim::AutogradAdd && type() != NoneType::get() &&
      !type()->cast<OptionalType>();
}

std::string Value::debugNameBase() const {
  std::string name = debugName();
  std::string name_base = name;
  auto last_dot_pos = name.find_last_of('.');
  if (last_dot_pos != std::string::npos && last_dot_pos + 1 != name.size()) {
    if (name.find_first_not_of("0123456789", last_dot_pos + 1) ==
        std::string::npos) {
      name_base = name.substr(0, last_dot_pos);
    }
  }
  return name_base;

jjohnson-arm · June 16, 2020, 11:31am

Unfortunately I think this will not help if you have two inputs called input.0 and input.1 (this is allowed). These will get remapped to something new like input.X and input.Y and it will be an assumption to work out which is which.

Unless I am missing something?

t-vi · June 16, 2020, 1:07pm

Actually, this can happen in the body of the function, but not here because the inputs actually come from a function signature. You can print traced_module.code to witness the translation (that is from where I tracked down the function reproducing the non-processed names). Another place where you can the argument names them directly and programmatically is in the schema of the traced module’s forward method: [a.name for a in traced_module.forward.schema.arguments].

I haven’t fully investigated what it would take to make PyTorch present the signature in a way retrievable by inspect.signature(it currently isn’t available), that might be the best way to present it.

That said, people appear to prefer the current API with its requirement to pass names and shapes regardless of whether they are already provided by the module, so I guess it’ll have to stay that way.

I just thought that it would help to analyze what is going on in the JIT to make informed decisions about how to convert models.

Best regards

Thomas

t-vi · June 17, 2020, 7:47am

While we’re at the topic of names: The params currently are just numbered. I must admit I’d think it’d be prettier if we used the state_dict names instead. What do you think?

masahi · June 17, 2020, 8:12am

Sure that sounds good. Since params are referenced by its numeric node id in Torch IR and we line-by-line translate, we still need association of numeric ID -> state_dict key name.

state_dict key name is available here as “full_attr” so you can use this name when creating Var.

github.com

apache/incubator-tvm/blob/master/python/tvm/relay/frontend/pytorch.py#L2278


            seen.update(map(_get_output_name, getattrs))

            full_attr = _getattr_full_name(getattrs)
            full_attr_node_name = _get_output_name(getattrs[-1])

            if full_attr.endswith("_packed_params"):  # for quantized models
                err_msg = "parameter %s not found in state dict" % full_attr
                assert full_attr in state_dict, err_msg
                packed_param_map[full_attr_node_name] = full_attr
            elif full_attr in state_dict:
                torch_tensor = state_dict[full_attr]
                tensor, var = _get_tensor_and_var(torch_tensor,
                                                  full_attr_node_name)
                param_tensors[full_attr_node_name] = tensor
                params[full_attr_node_name] = var

    return params, param_tensors, packed_param_map


def convert_block(block, outputs, convert_map, prelude):
    """ Translate Torch "Block", used for prim::If and prim::Loop """

full_attr_node_name is the numeric ID corresponding to full_attr. You need to add this mapping in outputs variable.