Tutorial for enabling auto-tuning on new op

I am working on enabling auto-tuning for softmax based on the discussion here, but am finding it difficult to know exactly what code to modify. From some digging, it seems that I have to modify OP2TOPI in extract_from_program in relay_integration.py, and a handful of places in topi_integration.py.

Further, I know that I have to modify the decorators on top of the existing softmax compute and schedules, as well as define a “workload”, but am still looking for exactly where that is.

Is there a tutorial available for this? It seems like a common enough scenario to warrant one. Or, would someone be able to point me in the right direction for implementing this? After successfully going through the process, I would be happy to write the tutorial.

Here is a case-specific tutorial, but this is not what you want if you plan to submit a PR for a tunable template.

For adding a tunable template, I don’t think we have a tutorial yet. I would suggest following one TOPI example such as dense on CUDA (conv2d has too many schedule templates for different cases and I think it’s harder to trace).

For example in this function, we define the dense compute and register it to AutoTVM for targeting to CUDA. Meanwhile, this function defines a schedule template. You can see that we also register it to AutoTVM for scheduling dense ops on CUDA.

The schedule function takes two arguments: cfg and outs. cfg can be either “a tuning space” (in the case of AutoTVM), or “a config entity” (in the case of building a kernel). When writing a template, we assume cfg is an empty space and we have to create something for it, such as L134. The defined tuning factor can be referenced in the rest of the template (e.g., L138).

On the other hand, when building the kernel, as I have mentioned, cfg will be an exact config entity instead of a tuning space. If users have tuned the op and invoked apply_history_best, then cfg would be the best config entity; otherwise, it will be an empty entity. In this case, we need to define a fallback config (“workaround” in your words) like L135.

I would be happy to see the tutorial if you would like to contribute one, so please let me know if there’s anything unclear and I could help figure it out.

Thank you both very much for the responses! They are really helpful. I’ll work on preparing the PR for adding this tunable template and ping back on here if I have any more questions.