What can get speedup with Duff's Device?

I can make relay function unroll a static amount of time (see https://en.wikipedia.org/wiki/Duff's_device)
by using the Partial Evaluator. So, I think one use case is just take an NLP model, unroll it and do some fusion. What NLP model allow us to fuse? LSTM doesnt seems fusable as there is dense and sigmoid in every cell, so what?

I may miss some background, so just for clarification

unroll it and do some fusion.

You mean doing fusion between loop steps?

LSTM doesnt seems fusable.

Those four gates in LSTM cells are actually fused.

Yes, I am thinking if it is possible to fuse multiple loop steps.

Persistent RNN is a good starting point, but this is non-trivial in engineering. I tried for 2 weeks but several bugs with tvm.scan prevented me to make them perfect.

To detect fixed-length scanning and then unroll, what do you think is a viable solution? My immature idea is to represent things using ADT.

Okay if your scope is limited to Duff’s device, i am not sure if there is such application.

I dont need a fixed-length scanning, I can make PE unroll any recursive function to a fixed depth.
However, I still cant find any case where it is useful… Alas.