How can I supply pre-compiled cl kernels to TVM?

MartynBliss · March 4, 2019, 4:44pm

I’ve got my test app working with cl kernels in a source file, now I want to use an offline tool to compile those kernels and then load the binary at runtime to avoid the compilation cost at runtime - how can I do this?

eqy · March 4, 2019, 6:50pm

In order to benefit from ahead-of-time compilation and OpenCL, you would need to know your workload exactly. In this case, is there a difference between what you need and what you can get by just running a warmup run of your known workload to pre-JIT everything?

MartynBliss · March 6, 2019, 9:01am

Thanks for your response. Yes I think there is a difference in that I don’t want the overhead cost of the warmup. Or rather, I want to do that once, and then use the compiled binaries from that point on.

sunzj · February 13, 2020, 9:17am

i also try to pre-compiled cl kernels. After investigation, i find the cl kernel is contained in the so library and is compiled online. i can the the compiled binary by “clGetProgramInfo”. But don’t know how to insert the binary into so library.

Right now, cl kernel is compiled when do inference at the first time. I move kernel compiling to model initialization, so it won’t affact inference time.

gasgallo · March 10, 2021, 9:06am

Is there any update about this topic? The openCL jit compilation takes too long on my device (several minutes!!).

Also any suggestion about how to speedup the compilation process would be greatly appreciated.

ryanstout · December 9, 2021, 3:40am

I’ve got the same issue, compilation time 40 seconds, but runs really fast once the kernel is compiled. Has anyone figured out a way to AOT compile the kernel, so I could just ship the compiled kernel. What I’m working on only has to run on a single piece of hardware. Thanks!

@gasgallo @sunzj any luck caching the kernel or anything? Thanks

FrozenGene · December 9, 2021, 6:59am

You could compile it and save it as clbin. Then you could serialize it into the compiled binary again. After this , you could have a much faster load time.

ryanstout · December 9, 2021, 4:51pm

@FrozenGene forgive my noobness. I see two references to clbin in the tvm source, but I’m not really sure where to go from there. https://github.com/apache/tvm/search?q=clbin It looks like the only place it calls OpenCLModuleCreate it hardcodes “cl”, not “clbin” https://github.com/apache/tvm/blob/37a8d7b2b7df647a5dac6fbda18ee54d902ce4e4/src/target/source/codegen_opencl.cc#L536

Any chance you could give me some more info? I’m also trying to cross compile things, though I would be fine if the kernel could be compiled to a file on device I could cache or something.

Thanks!

FrozenGene · December 10, 2021, 2:04am

@ryanstout Yes.You should have some work to complete it. You could refer : tvm/codegen_aocl.cc at 37a8d7b2b7df647a5dac6fbda18ee54d902ce4e4 · apache/tvm (github.com) Which cross compile it into a binary cache file. However, if you don’t have cross compile tool unfortunately , I have one hack method you could refer. You could run it on the device board and save the result in one file as clbin. If do so, you should enter into here firstly: tvm/opencl_module.cc at 37a8d7b2b7df647a5dac6fbda18ee54d902ce4e4 · apache/tvm (github.com). After it, you could pull clbin file from the device, then when you run the second time, you could set one environment TVM_CLBIN_PATH, and enter into https://github.com/apache/tvm/blob/37a8d7b2b7df647a5dac6fbda18ee54d902ce4e4/src/target/source/codegen_opencl.cc#L536, you could add logic to check TVM_CLBIN_PATH environment value, if we have, great, we are in the second run and we have one compiled clbin file located in the TVM_CLBIN_PATH, then you go to return OpenCLModuleCreate(code.str(), "clbin", ExtractFuncInfo(mod), code.str()); Of course, don’t forget add clbin support in the tvm/opencl_module.cc at 37a8d7b2b7df647a5dac6fbda18ee54d902ce4e4 · apache/tvm (github.com)