[External Codegen] Constant tensors in c-codegen

Hi, I’m attempting to integrate Arm Compute Library using the external c-codegen route but I’m running into an issue within codegen where I would like to declare weights as constant. Currently, it seems a relay sub-graph used in c-codegen is expected to have weights (and other tensors) declared as variables rather than as constants. I assume this is so that they can be treated like normal inputs to the subgraph. However, this means I cannot perform passes like constant folding of layout_transform operators on the sub-graph.

I’ve been looking into ways to overcome this, but the only solution I can think of is to output these tensors directly into the codegen stream. This would be ok for very small tensors, however weight tensors can get very large for graphs like VGG16. I was wondering if there was any way around this?

The reason we did this was because the newly created function expects params to take variables. I could think of two approaches to solving this problem.

  • Add a constant propagation pass to propagate the constants to the created functions
  • Record the newly created Vars and their corresponding constant values. We can run BindParamsByName on each of the new functions.

The second approach should be easier. I am working on outlining the created to functions to the module level and inlining them back later so that Relay pass wouldn’t touch them. After this change, I can probably come back to tackle this problem.

Updated: Actually both approaches could use BindParamsByName

Thanks for the response! After reading my initial question again I don’t think I explained the issue very well, sorry about that. I’m actually already at the stage of having used BindParamsByName on the function to receive relay that looks something like this:

%6 = fn (%acl_input1: Tensor[(1, 226, 226, 64), float32], Compiler="acl", ExternalSymbol="acl_19", Primitive=1) -> Tensor[(1, 224, 224, 64), float32] {
    nn.conv2d(%acl_input1, meta[relay.Constant][2], padding=[0, 0, 0, 0], channels=64, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWIO")
};

However, when doing the actual codegen I’m not sure how the constant should be represented in c++. It could be a hard-coded vector e.g. ACL_Conv2d(acl_input0, std::vector({1, 2, ...}), ...) but this wouldn’t be effective for a large tensor. Is there a way instead to deal with constants a little bit like normal inputs so we can call for example ACL_Conv2d(acl_input0, acl_params0) where acl_params0 is a pointer to params input using set_input(**params)? Hope this makes more sense.

If I understand correctly, you wish constants can be optimized by Relay passes like constant folding. However, you don’t want to hard coded the optimized constants to the generated C function but want them to be serialized to disk.

If the above summary was correct, them you should handle those constants in your codegen. set_input(**params) doean’t make sense because users are not supposed to know that “parameters”. One straightforward way is to include dlpack. Specifically when generating a C function for a Relay subgraph, you serialize dlpack constant arrays to a file and keep it along with the generated TVM runtime module. In runtime, your generated C function looks for the constant file and loads them back.

Your summary was correct. Thanks for the suggestion it makes sense and I’ll give it a try. One immediate question I would have for someone wanting to use this out of the box is how would you be able to specify the path to output the serialized file during the build phase? I don’t believe the build process at this point has any concept of where the user wants to export their compiled model and therefore we have no idea where the serialized params should be output.

What I mean by set_input(**params), is if we have a call to build a module:

graph, lib, params = relay.build(module, ..., params=params)

A set of initial params are input, presumably some manipulation happens behind the scenes and finally the transformed params are output. In this process is there anyway an external codegen could add to the params that are output so all we have to do when we come to load and run the module is:

mod.set_input(**params)
mod.run()

I’m not sure if this is possible or even makes sense to do, but I think it would be easier than dealing with a temporary file.

The set_input approach won’t be easier since it violates the design philosophy in my opinion. If users have called bind parameters, the parameters should already be “binded” to the model and you don’t have to (and cannot) set them in runtime.

On the other hand, the question regard to the output path seems reasonable, although I don’t have an acceptable solution for now yet…

Thanks, I agree with your point.

Apologies for bringing up an old post for another question - I didn’t think it warranted a new one and is related to this topic.

Using the propagate constant to subgraphs PR (https://github.com/apache/incubator-tvm/pull/5094) I’ve run into the stack overflow issue mentioned in the comments trying to compile vgg16:

// Define a const buffer: float const_0[64] = {1.0, 2.0, ...};
//
// Technically, you may need: static float* const_0 = (float*)malloc(4 * 64)
// to avoid possible stack overflow.

In the stack array example you’re able to use an initializer list to initialize an array with values, I was just wondering if there was a similar way you had in mind for the array allocated on the heap?

cc @zhiics @comaniac

No I think you could just assign the value one-by-one. One optimization you could consider is to make sure you assign values only at the first visit.

Thanks, this is ok for a small number of values but doing this for ~13,000 values is where I think things get interesting. Am I correct in thinking that writing a[0] = 2; a[1] = 1; ... for every value will cause the intermediate c file that is generated to explode in size? If that’s the case I don’t think there is any other option other than serializing and saving the constants to a separate file?

The larger constant tensor issue was also considered before, but since we have no idea how will developers deal with constant tensors, we transparent this part to the developers. As a result, you can do anything you think that’s better, including writing them out to a separate file.

I see, thanks for the help!

I’ve had a chance to look at this now and it seems like it’s quite a fundamental issue with C codegen, not just ACL. This will make a lot of compile-time optimisations impossible as there’s no reasonable way to handle large constant tensors in the codegen. This we be especially prevalent when there’s a need to do a layout transform between Relay and the external codegen.

I don’t have an immediate solution to this but have discussed it in some detail with @lhutton1. Most ‘obvious’ solutions don’t work because of the novel way CSourceModules are built (you create the source file and then it only gets compiled during export_lib).

Can you take another look at this @comaniac and @zhiics? Our current hack of writing them out to a hard coded location won’t be acceptable long term.

@mbaret Have you considered using different codegen than CSource? To deal with large constants, I think binary serialization based codegen is a good fit.

@masahi Thanks for the suggestion, however since ACL is a c++ library ideally we would want to be able to cross-compile our codegen before using it on the remote device. I don’t think we can assume the remote device has its own toolchain to compile the codegen we receive.

hmm, for my use case, I simply serialize a Relay subgraph into some binary format, pass the binary to runtime module and deserialize there. Then I can execute this graph with arguments I recieve from TVM in whatever way I like, including offloading to dnnl. This week I integrated upstream changes and I was delighted to see ConstandNode in my codegen (thanks @zhiics!!). I was able to simply serialize constants together with graph structure in the same binary.

How are you compiling? We could serialize the graph, but we’d then need to codegen the relevant ACL API calls on the remote and compile it into something that can be executed. We can’t do that without a toolchain though which can’t be guaranteed.

“The executor” part, including API calls to DNNL, is defined in another lib that is built outside of TVM, and linked to my TVM build. My TVM external runtime passes binary or deserialized graph rep together with arguements from TVM to that lib, and this lib knows how to execute the graph. The output from the lib is returned to the TVM runtime.

I don’t need to generate any C code or invoke some external toolchains like gcc. But I’m still at the experiment stage, not sure if this flow is final.

Ah, I understand now. We’ll have a look at how viable that’ll be for ACL. Thanks for the suggestion!

1 Like

I think TensorRT integration by AWS works in a similar way. If I remember correctly, they use json instead of binary.

1 Like