[DISCUSS] Runtime Array Containers Array/ADT/String

FrozenGene · December 20, 2019, 11:40am

My opinion is we don’t need to support to move from std::vector. When we design it, we should encourage developers using this. From the experience, we like to use std::string, but we like to use XXX::Array / XXX::SmallVector and so on.

wweic · December 28, 2019, 8:22pm

Thanks @FrozenGene for your input! Looks like you vote for a full feature array implementation. I think it will also cause less confusion if we can program directly in tvm’s containers as much as possible.

@tqchen For vector/string, if we want to support move from std containers without copying, we have to assume the original container object’s lifecycle is longer than our reference object. But I don’t think it’s always true. For example, std::string is usually short lived. We might have to copy the content anyway, but we can reuse InplaceArrayBase actually now to reduce one extra indirect memory access.

tqchen · December 28, 2019, 9:25pm

Re: life-cycle management in move semantics, we will need to explicitly ask user to move, which means std::move(mystr) would results in a move and the original container is moved to FromStd container and thus its lifecycle will be the same as the object. Otherwise, it will results in a copy, and we invokes the normal inplacearray allocation(but we will need to store the same pointer for compact reasons). So the main question is whether the additional 8 bytes cost worth it.

Here are some additional thoughts that I think worth discussing.

Another thing about String, which might worth considering is whether we want to lazily memoize hash, given string is immutable here.

Given that both Array and ADT are immutable(under the hood, as we do copy on write for array). It is interesting to ask whether we want to simply use Array to represent a Tuple. We will face this issue when we enhance the language frontend to do the conversion. For example, when a tuple is passed from python side, it gets converted into ADT, but when the implementation might requests an Array instead. Having a different container means we need to do forced conversion(either copy or move). The only complication might be to teach the relay vm to recognize these additional runtime objects

wweic · December 28, 2019, 11:13pm

Re: move support, yes we can get rvalue reference of std::string for example, we can not simply save the data pointer though, we also need to set the rvalue’s data pointer to null, otherwise the rvalue’s destructor will return the memory back to allocator. But the data pointer is not public I think, so we can not set it to null. Let me know if this explains the problem.

If space efficiency is not a concern, we can add cached hash value.

For ADT object, we need to know the tag of the object(to identify which variant it belongs to). This is not needed by Array. Unifying the two will have waste for Array objects I think.

tqchen · December 29, 2019, 12:44am

I do not mean unify ADT and array, but use Array to store tuples(as opposed to ADT with tag=0), for the rvalue move, we can simply move the std::string to a local container

Here is an example code that supports move(string should be similar) for Array

class ArrayObj::FromStd : public ArrayObj {
 private:
   std::vector<ObjectRef> data_;

   explicit FromStd(std::vector<ObjectRef>&& data)
      : data_(std::move(data)) {
      data_ = data_.data();
      CHECK_LE(data_.capacity(), UINT_MAX);
      size_ = static_cast<uint32>(data_.size());
      capacity_ = static_cast<uint32>(data_.capacity());
   }
   friend ObjectPtr<ArrayObj> ArrayFromVector;
};

wweic · December 29, 2019, 1:16am

Oh, I think it makes sense to use Array to store tuples. They are often used interchangeably.

Thanks for the clarification, I was under the impression we want to get rid of std containers in our containers object. So we just need to ensure the head of our object to be POD-C compatible.

tqchen · December 29, 2019, 6:32pm

yes, because after the construction, the reference will no longer see the non-header part, nor do they need to(as they can always call the deleter to delete)

tqchen · January 2, 2020, 3:41am

Combining the opinions so far, seems that we start to agree on building a fully featured Array and not support move(given it is not a common usecase). I also start to agree on this.

I am still not too sure about String given that there are quite a few cases where string are still constructed natively.

Supporting move from string allows interesting cases like constructing a long std::string and move into it, or move language native string(like python string) into the container.

On the other hand, directly support an inlined string is also not bad for most cases where number of chars are smaller than 8 or 16 bytes. Note that we need to cost these extra bytes(or even more) anyway to store the original data structure, we might as well just copy the content over. So the benefit for move might only occur for very large bytearrays.

Finally, having a data* field to support move won’t prevent us to use a smarter create array inline and copy the content or move on the size of the string being constructed. Although it maybe overkill. So the may difference would be whether we want to pay 8 bytes for such flexibilibity.

Given that this decision won’t affect the user API, but mainly the ABI(the behavior when we constructed a compiler that directly mainpulates the String) perhaps we can make a call and move forward for now. Also cc @junrushao @haichen @ajtulloch @jroesch for their thoughts.

wweic · January 2, 2020, 3:02am

So for String, let’s implement the move only version, the inline string can be added later if needed.

tqchen · January 2, 2020, 4:13am

sounds good, if we decided to pay the cost of data*, then the followup decision of which variation can be deferred.

junrushao · May 15, 2020, 3:08am

Follow up PR for runtime::Array: https://github.com/apache/incubator-tvm/pull/5585

junrushao · June 8, 2020, 8:04am

Follow up PR for runtime::Map: https://github.com/apache/incubator-tvm/pull/5740

xiandi · June 24, 2020, 9:47am

Hi, can I use StringList or Map<String, String> in Relay? example ？

t-vi · June 24, 2020, 2:07pm

My impression is that one drawback of this switch is that in the Python interface, a lot of conversions to standard types seem to be needed. When using shapes outside of TVM, one has to convert Arrays of IntImm to lists of int. Even when feeding shapes back to TVM I seem to need a lot of list(…) casting. (And certainly when I want to use list manipulations.)

Now, I’m not sure what the best way of action is, but I sometimes wish that we could just convert to standard Python objects when we hand them out (although I recognize that this might be a lossy conversion).

Best regards

Thomas

tqchen · June 24, 2020, 2:56pm

As an alternative, we can try to add enough overloads to make sure that the object itself behave like an list and number so such cast is not necessary. We have already done that for string(by having String to subclass str)

junrushao · June 24, 2020, 6:15pm

I think you can just write Map<String, String> or Array<String>. Let me know if any issues occurred

xiandi · June 28, 2020, 6:32am

Thank you very much for answering. Can I use in this way: a = relay.var("a", MapType); b = a["key"];? Please correct me if there is any mistake.

junrushao · June 28, 2020, 5:37pm

Unfortunately no, it is right now in runtime, not IR…

xiandi · June 29, 2020, 2:31am

Are there any plans to support these features in IR?

xiandi · August 10, 2020, 3:13pm