Double_buffer schedule conflict with tenorize

I put the example and reproduce steps here:
[https://github.com/dmlc/tvm/issues/2581]

it seems if I enable tenorize on the stage, the double buffer schedule is gone. is it a bug?

1 Like

This is a good point. The main reason could due to the restriction of the double buffering schedule on not being able to work with generic copy intrinsics, as the semantics of such copy is not known, so the rewriter chooses not to rewrite in this case. Would be great if we could discuss possible solutions and see if other developers in the forum would be interested in commenting

I spent some time to debug. The direct cause is when inject double buffer schedule, the algo is first we touch the variable to be double buffered, then we visit the body, if we found the variable is been reference in the body, we remove the double buffer schedule, see code here

If we don’t use tensorize, the stmt in the body is a “Store”, when visit Store stmt, tvm didn’t visit the buffer_var, so it is fine.
But in my intrinsic, I need to pass in the target buffer address as a argument, so when visit the “Call” stmt, we will iterate on the target buffer variable, which will remove the target var from what double buffer schedule touched.

Do you want to introduce more about what’s the reason we want to clear the double buffer schedule if the reference to target variable found in the body?

Hi souptc, this problem is solved ? We Meet the same problem.

I fixed it in our local repo by modify the DoubleBufferDetector that ignore the check on our buffer var.

I have tried this, and it cause an error in the Mutate_(Variable) of DoubleBufferInjector. If I disable this check too, code can be generated, but the %2 mechanism won’t appear in the args of intrinsic call.

@zfhn @souptc This problem is not solved. we Meet the same problem, Can you give me some advice?

I have tried another way to solve this problem. The main idea is like that:

  1. Tensorize the cooresponding stage with your own intrinsic at first.
  2. Set the double buffer scope of corresponding stage.
  3. Modify the original double buffer pass which is mainly for GPU. Traverse all call op, then update buffer’s offset which is scoped with double buffer.
  4. Insert the sync intrinsic if needed. I hope it can help.

Thank you for your advice, but I am not familiar with the double buffer pass code. Can you provide the modified double buffer pass code for reference?

Hi @SYangDong

Could you provide some code on how you modified the buffers offset?

Thanks!