[Bug][Relay] AlterOpLayout attempting bad conversions for CPU

jwfromm · May 11, 2019, 12:32am

I have a slightly unconventional architecture that at one point performs a convolution on an input with shape (1, 1, 32, 32). When building for a GPU target, everything works well and the graph runs. On CPU however, it looks like relay attempts to convert the tensor from NCHW to NCHWc8, which of course triggers an error since 1 isn’t divisible by 8. I’m not sure why Relay is trying to make this conversion or how I can prevent it from doing so. I’d really appreciate any tips!

Here’s the invocation and error message I get:

with relay.build_config(opt_level=3):
    graph, lib, params = relay.build_module.build(
        net, target=target, params=params)

TVMError: Check failed: pb->value != 0 (0 vs. 0) : Divide by zero Error during compile function ----------------------------- v0.0.1 fn (%p0: Tensor[(1, 1, 32, 32), float32], __dict__=meta[StrMap][0]) -> Tensor[(1, 0, 32, 32, 8), float32] { layout_transform(%p0, src_layout="NCHW", dst_layout="NCHW8c") /* ty=Tensor[(1, 0, 32, 32, 8), float32] */ }

kevinthesun · May 11, 2019, 7:11am

Currently if no dispatch context(Fallback context) is set for autotvm, it will download pre-tuned log file from tophub. If a conv2d workload doesn’t exist in that log, it will infer a cfg, which can be incorrect for x86 target. To avoid this, you can 1) autotune your conv2d network. 2) create an empty file and put build under

with tvm.autotvm.apply_history_best(empty_log):

This will use default schedules.

I think current behavior(download log from tophub and infer cfg even when no context is specified) can cause some confusions.
@eqy @merrymercy

merrymercy · May 11, 2019, 5:42pm

@kevinthesun Where does the “infer cfg” happen?
For x86, I think it is implemented by

github.com

dmlc/tvm/blob/5f5bf79742552194eeb5256026ae60e9cfe94426/topi/python/topi/x86/conv2d.py#L369-L370


if cfg.is_fallback:
    _get_default_config(cfg, data, kernel, strides, padding, out_dtype, is_depthwise)

which can handle all cases correctly.

In autotvm framework, there is no automatic inference of config.

kevinthesun · May 11, 2019, 9:07pm

@merrymercy Yes. x86 is the target which has its own default schedule generator, and shouldn’t use pre-tuned log to generate cfg for unknown workload. However, recently we have a similar problem which fails to use default generator for unknown workload @wweic We can solve this problem by ignore the x86 tophub pre-tuned log. @jwfromm I’m not sure whether your problem is the same one. Did it generate warning message like:

WARNING:autotvm:Cannot find config for target=llvm -mcpu=skylake-avx512, workload=.... A fallback configuration is used, which may bring great performance regression.

If not, it means that default generator is not called.

jwfromm · May 13, 2019, 8:51pm

Thanks for your replies @kevinthesun, unfortunately using an empty log and autotuning from scratch didn’t help with this error. Although autotuning goes well, when I attempt to build with opt-level 3 it again fails. I think the issue comes from my unusual architecture. The convolution is actually being used to generate padding, so the weights and inputs are constant. I think this causes it to get folded away during the fold_constants pass which prevents a sensible schedule from being applied. Then, the AlterOpLayout pass tries to make an invalid conversion. Although this is a fringe case, it might be worth considering extending AlterOpLayout to check if conversions are valid.

jwfromm · May 13, 2019, 9:52pm

As an interesting other symptom of the problem, it seems that autotvm.task.extract_from_program doesn’t find the [1, 1, 32, 32] convolutions. This leads to autotuning not being run on those layers. Applying the best history immediately after autotuning still gives the warning message that the schedule for [1, 1, 32, 32] convolution couldnt be found.

kevinthesun · May 15, 2019, 8:35pm

@jwfromm Thank you for investigating! If you can provide a code snippet to reproduce this problem, it will be helpful for us to fix it.

jwfromm · May 16, 2019, 9:50pm

This bug actually turned out to be a little different than I initially thought. The problem was that my model took the output of a convolution and added a noise tensor to it. As an example of why this is a problem, consider an NCHW convolution output with shape [1, 8, 32, 32]. The noise tensor being added had shape [1, 1, 32, 32] and was expected to broadcast across the channels. However, when Relay converts the convolution to NCHWc, it also tried to convert the noise tensor but did so without broadcasting, resulting in a failed conversion since [1, 1, 32, 32] cant be shaped as [1, 1, 32, 32, 8]. I’ve fixed this on my end by simply explicitly broadcasting the noise tensor, but in a perfect world relay would check if broadcasting is required during AlterOpLayout.