[heterogeneous execution] How to use Relay heterogeneous execution in whole model execution?

FrozenGene · May 28, 2019, 4:01pm

Currently, I only find Relay’s heterogeneous execution unit test is to construct network using Relay API and annotate op. However, how could we use heterogeneous execution when we use like func, params = relay.frontend.from_tensorflow(graph_def, layout=layout, shape=shape_dict)? for example, I want conv2d of this model execute on GPU, but others op on CPU.

@zhiics

zhiics · May 28, 2019, 6:38pm

@FrozenGene

This should be helpful to you:

FrozenGene · May 29, 2019, 2:22am

Thanks @zhiics. I will try it later. However, don’t you think we could construct one more friendly interface? For example, let use pass op_name_target_dict to build_config or arguments of build, then we do this pass internal. This will make users avoid writing the pass every time.

zhiics · May 29, 2019, 3:00am

@FrozenGene Yeah, the thing you are mentioning is doable, but it introduces extra fields to the build_config which seems not very necessary. This is because you have use a separate pass (as in the above link) to achieve the same functionality easily.

janimesh · May 29, 2019, 7:07am

I agree with @FrozenGene here. I think there are many usecases which can be satisfied if we have a somewhat versatile graph partitioning infrastructure. In my experience, the partitioning is actually a multi-level partitioning.

Device partitioning - Whether we want to run on CPU/GPU/DSP etc
Unit partitioning - Within a device, whether we want to run on unit 1 or unit 2. Like in case of ASICs, we might want to pipeline sections of original neural network. This is done for RNNs and LSTMs to keep the weights on-chip.
Codegen partitioning - Do you want to use TVM for codegen? or QNNPack? or MKLDNN? or some in-house codegen? (This is extremely useful if we are heavily using tensorize function while writing schedule, because at that time, we are reinventing the wheel that HW vendors have done.)

This brings in a couple of other points to be considered

op_name might not be enough to perform the partitioning. One might want to keep first conv2d on device 1 and rest conv on device 2.
I think we want to have full subgraphs that can be tagged with a target, so that we can run all target-dependent Relay passes (like AlterOpLayout) on those subgraphs individually.

imorinaga · May 29, 2019, 8:46am

I think the problem is that most possible CLI is not user friendly to change some node or add attribute to a graph. If we have something like relay visualizer (like tensorboard’s Graph Visualization), it might be good interface for heterogeneous annotation or applying multi-level partitioning as @janimesh mentioned above.

zhiics · May 29, 2019, 5:27pm

Yes, we may also need to bring the multi-level partitioning for heterogeneous into TVM in the future. This would really want us to have a more systematically partitioning approach which considers more metrics other than just op_name, e.g. hardware features and network structures might also need to be considered.

But again, this still probably indicates that we’d better not add more extra field/config to the build_config for now. This is because: 1) for simple annotation like all conv to GPU and nms to CPU could be easily done by a simple pass and users/developers could customize it. 2) for more complicated annotation/partitioning schemes, it looks to me that simply adding one or two fields in it may still not solve the problem.

@imorinaga yes, visualizer can help users for annotation if they know how they’d like to partition the graph. But in some certain cases that the partitioning should be decided by the compiler/runtime based on some profiled data/runtime behavior, visualizer might not be very helpful.