Graph partitioning and Heterogeneous Execution


@zhiics I’m curious about the timeline for this item - this will be particularly useful for our upcoming FPGA backends (especially for PCI-E based FPGA platforms).



@thierry I think it is on the 0.5 roadmap. I have something now and we tested it successfully on a processor with Intel CPU and Intel Graphics. I am more focusing on some design details now.


Thanks, this will be very useful for FPGA backends down the road. Is a PR for this already open? If now what’s the timeline? Thank you


@thierry It’s currently in my private repo. We are testing it with some use cases. If you want it soon, I can probably add you to the repo and we can start from there. Sounds like a plan?


That would be fantastic! I’d be happy to provide feedback as well with respect to our FPGA examples. My github ID is tmoreau89.


Is it possible to make the WIP in a public fork? I am assuming many people watching this thread would be interested in what is going on, and it helps making things more accessible to broader community


Good idea, if that’s not too much of a hassle, it would be great to have the community also provide feedback on your WIP.


@tqchen @thierry Sounds good. I can check with somebody internally and see how to proceed.


Got the approval and sent the WIP out:


What do you think to have a dedicated module which handles memory syncing across devices rather than explicit copy node in graph? Maybe too much rewrite… @tqchen