[External Codegen] Constant tensors in c-codegen

Hi, I’m attempting to integrate Arm Compute Library using the external c-codegen route but I’m running into an issue within codegen where I would like to declare weights as constant. Currently, it seems a relay sub-graph used in c-codegen is expected to have weights (and other tensors) declared as variables rather than as constants. I assume this is so that they can be treated like normal inputs to the subgraph. However, this means I cannot perform passes like constant folding of layout_transform operators on the sub-graph.

I’ve been looking into ways to overcome this, but the only solution I can think of is to output these tensors directly into the codegen stream. This would be ok for very small tensors, however weight tensors can get very large for graphs like VGG16. I was wondering if there was any way around this?

The reason we did this was because the newly created function expects params to take variables. I could think of two approaches to solving this problem.

  • Add a constant propagation pass to propagate the constants to the created functions
  • Record the newly created Vars and their corresponding constant values. We can run BindParamsByName on each of the new functions.

The second approach should be easier. I am working on outlining the created to functions to the module level and inlining them back later so that Relay pass wouldn’t touch them. After this change, I can probably come back to tackle this problem.

Updated: Actually both approaches could use BindParamsByName

Thanks for the response! After reading my initial question again I don’t think I explained the issue very well, sorry about that. I’m actually already at the stage of having used BindParamsByName on the function to receive relay that looks something like this:

%6 = fn (%acl_input1: Tensor[(1, 226, 226, 64), float32], Compiler="acl", ExternalSymbol="acl_19", Primitive=1) -> Tensor[(1, 224, 224, 64), float32] {
    nn.conv2d(%acl_input1, meta[relay.Constant][2], padding=[0, 0, 0, 0], channels=64, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWIO")

However, when doing the actual codegen I’m not sure how the constant should be represented in c++. It could be a hard-coded vector e.g. ACL_Conv2d(acl_input0, std::vector({1, 2, ...}), ...) but this wouldn’t be effective for a large tensor. Is there a way instead to deal with constants a little bit like normal inputs so we can call for example ACL_Conv2d(acl_input0, acl_params0) where acl_params0 is a pointer to params input using set_input(**params)? Hope this makes more sense.

If I understand correctly, you wish constants can be optimized by Relay passes like constant folding. However, you don’t want to hard coded the optimized constants to the generated C function but want them to be serialized to disk.

If the above summary was correct, them you should handle those constants in your codegen. set_input(**params) doean’t make sense because users are not supposed to know that “parameters”. One straightforward way is to include dlpack. Specifically when generating a C function for a Relay subgraph, you serialize dlpack constant arrays to a file and keep it along with the generated TVM runtime module. In runtime, your generated C function looks for the constant file and loads them back.

Your summary was correct. Thanks for the suggestion it makes sense and I’ll give it a try. One immediate question I would have for someone wanting to use this out of the box is how would you be able to specify the path to output the serialized file during the build phase? I don’t believe the build process at this point has any concept of where the user wants to export their compiled model and therefore we have no idea where the serialized params should be output.

What I mean by set_input(**params), is if we have a call to build a module:

graph, lib, params = relay.build(module, ..., params=params)

A set of initial params are input, presumably some manipulation happens behind the scenes and finally the transformed params are output. In this process is there anyway an external codegen could add to the params that are output so all we have to do when we come to load and run the module is:


I’m not sure if this is possible or even makes sense to do, but I think it would be easier than dealing with a temporary file.

The set_input approach won’t be easier since it violates the design philosophy in my opinion. If users have called bind parameters, the parameters should already be “binded” to the model and you don’t have to (and cannot) set them in runtime.

On the other hand, the question regard to the output path seems reasonable, although I don’t have an acceptable solution for now yet…

Thanks, I agree with your point.

Apologies for bringing up an old post for another question - I didn’t think it warranted a new one and is related to this topic.

Using the propagate constant to subgraphs PR (https://github.com/apache/incubator-tvm/pull/5094) I’ve run into the stack overflow issue mentioned in the comments trying to compile vgg16:

// Define a const buffer: float const_0[64] = {1.0, 2.0, ...};
// Technically, you may need: static float* const_0 = (float*)malloc(4 * 64)
// to avoid possible stack overflow.

In the stack array example you’re able to use an initializer list to initialize an array with values, I was just wondering if there was a similar way you had in mind for the array allocated on the heap?

cc @zhiics @comaniac

No I think you could just assign the value one-by-one. One optimization you could consider is to make sure you assign values only at the first visit.

Thanks, this is ok for a small number of values but doing this for ~13,000 values is where I think things get interesting. Am I correct in thinking that writing a[0] = 2; a[1] = 1; ... for every value will cause the intermediate c file that is generated to explode in size? If that’s the case I don’t think there is any other option other than serializing and saving the constants to a separate file?

The larger constant tensor issue was also considered before, but since we have no idea how will developers deal with constant tensors, we transparent this part to the developers. As a result, you can do anything you think that’s better, including writing them out to a separate file.

I see, thanks for the help!