Scheduling the obtained graph

I am playing with ONNX super resolution model. After obtaining the graph how can I store it in the tensors using placeholders so that I may be able to apply my own scheduling on them?

@eqy do you anything on this?

I am not sure I understand the question here; are you asking about converting from ONNX to tvm? Or is the question more about how to schedule operators?

(I do not know what you mean by storing a graph in tensors)

Please correct me if I have some wrong understanding of the concept.
My question simply here is, say I have a model, and compiling using nnvm we have got the graph. Our target is arm cpu. So what I can do is tune that model for target and just run it on the target?
What if I want to do some other optimizations myself, like applying multi-threading or other optimization features? Or tuning itself for the respective target will give me maximum optimized inference?

Typically tuning is the easiest and most efficiency way to get good performance, unless you are an expert in the scheduling transformations best suited for your hardware architecture. You can look at the tune_nnvm_arm tutorial for an example.

Yes, I looked at it, but for this I would need RPC srever. Seems like my host is unable to connect to the device using RPC server. I can’t have them in the same network, what other method do I have to run the code on arm device?

If you cannot have them in the same network, can you use ssh forwarding? We have a mechanism to report a custom public IP address back if NAT is involved.

@eqy I have done optimization for specific hardware before, could you guide me as to how I can perform my own scheduling on the obtained graph. I that feature available or do I have to dig it my own?

There are many examples of schedule transformations for operators, both as tutorials such as this one and in the form of AutoTVM templates in TOPI.

Thanks for the reply, but I was hoping to get a tutorial on how can I optimise the graph’s inference with my own scheduling. In all the tutorials available we are defining the tensors, the functions, the operations, the scheduling and finally we get the shared library for the compiled functions.
Currently, we load the graph, compile it and run inference on it but
My main agenda here is: I have a model ( of tflite or tensorflow, basically any one model that tvm supports). Now, not only do I have to run that model on tvm for inference, but also I have to optimise it further(more than tuning for the particular target) for a particular target say arm cpu. Possible?

I am not sure I understand the question. What do you mean by “optimize it further?” Tuning chooses between schedule transformations, but you are free to modify the search space explored by tuning. At a higher level (e.g., the graph), you are also free to define variants of operators with different data layouts better suited to your particular hardware device.

@rahulkodinya would you like to add some clarity to the question?

@eqy We are interested in manually writing the schedule for acceleration of a trained model, so that we should be able tune the execution for a custom CPU hardware with specific number of cores, L1, L2.caches etc.

So far we are manually writing the schedules for Image Processing Convolutions in Halide and now want to try the same with TVM.

This is possible, and the current TOPI system allows you to register custom schedules for your hardware.