I was wondering if it is possible to schedule parallel prefix scan in TVM’s Halide IR.
For now, TVM has the primitive, tvm.scan, which is semantically equivalent to prefix scan. However, scheduling loops containing
scan.idx is not as easy compared with other loops, because of its sequential dependency.
If we do care about efficiency, what we probably need is to do reduction on a binary tree, like many textbooks may mention (like this). However, it does change the semantics of loops, so I suspect that it is hard to implement using TVM’s Halide IR.