AutoTune Failed in Windows because No task extracted from IRModule

lsy643 · May 4, 2020, 2:22pm

Hi: I am trying to tune a network in OS Windows, but it failed because the number of tasks coming from autotvm.task.extract_from_program is zero. The log is below

download failed due to URLError(gaierror(11004, 'getaddrinfo failed')), retrying, 2 attempts left

download failed due to URLError(gaierror(11004, 'getaddrinfo failed')), retrying, 1 attempt left

WARNING:root:Failed to download tophub package for cuda: <urlopen error [Errno 11004] getaddrinfo failed>

Exception in thread Thread-1:

Traceback (most recent call last):

  File "C:\Users\1\.conda\envs\py3.7\lib\threading.py", line 926, in _bootstrap_inner

    self.run()

  File "C:\Users\1\.conda\envs\py3.7\lib\threading.py", line 870, in run

    self._target(*self._args, **self._kwargs)

  File "C:\Users\1\.conda\envs\py3.7\lib\site-packages\tvm-0.7.dev1-py3.7-win-amd64.egg\tvm\autotvm\task\relay_integration.py", line 57, in _lower

    opt_mod, _ = relay.optimize(mod, target, params)

  File "C:\Users\1\.conda\envs\py3.7\lib\site-packages\tvm-0.7.dev1-py3.7-win-amd64.egg\tvm\relay\build_module.py", line 303, in optimize

    mod, params = bld_mod.optimize(mod, target, params)

  File "C:\Users\1\.conda\envs\py3.7\lib\site-packages\tvm-0.7.dev1-py3.7-win-amd64.egg\tvm\relay\build_module.py", line 157, in optimize

    mod = self._optimize(mod, target)

  File "C:\Users\1\.conda\envs\py3.7\lib\site-packages\tvm-0.7.dev1-py3.7-win-amd64.egg\tvm\_ffi\_ctypes\packed_func.py", line 218, in __call__

    ctypes.byref(ret_val), ctypes.byref(ret_tcode)) != 0:

OSError: exception: stack overflow

So any advice to make it work? Thanks a lot

lsy643 · May 4, 2020, 3:40pm

I have tried to tune the network in Linux, and a similar problem occurs.

This problem can be fixed in Linux by editing extract_from_multiple_program to increase the stack size of threading

        old_stack_size = threading.stack_size(1024 * 1024 * 1024)
        build_thread.start()
        build_thread.join()
        # Restore stacksize to original
        threading.stack_size(old_stack_size)

But the problems still exists for windows after changing the stack_size.

This problem occurs maybe because the limited stack size of the threading, especially when a large network will be used to extract tasks. A similar situation is autotvm-extract-tasks-from-bert-model-cause-segmentation-fault

Any advice will be appreciated. Thanks a lot.

comaniac · May 4, 2020, 4:45pm

As the posted you pointed out, the stack overflow issue when extracting a large model has been resolved unless your model has control flow and cannot be executed by graph runtime. What model are you working on, how large it is, and does it have a control flow?

lsy643 · May 5, 2020, 3:07am

The network I am working on can be considered as a Resnet-101 and FPN network of Tensorflow, and I believe there are no control flow operations in this network. All the operations are listed below:

Add
MaxPool
Placeholder
FusedBatchNorm
Conv2D
Const
Pad
BiasAdd
Relu
Identity
Transpose

All the workloads extracted from the network in Linux:

('conv2d_nchw.cuda', ('TENSOR', (1, 256, 232, 232), 'float32'), ('TENSOR', (256, 256, 1, 1), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 1024, 58, 58), 'float32'), ('TENSOR', (2048, 1024, 1, 1), 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 512, 116, 116), 'float32'), ('TENSOR', (1024, 512, 1, 1), 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 256, 232, 232), 'float32'), ('TENSOR', (512, 256, 1, 1), 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 3, 933, 933), 'float32'), ('TENSOR', (64, 3, 7, 7), 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 64, 232, 232), 'float32'), ('TENSOR', (64, 64, 1, 1), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 256, 232, 232), 'float32'), ('TENSOR', (64, 256, 1, 1), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw_winograd.cuda', ('TENSOR', (1, 64, 232, 232), 'float32'), ('TENSOR', (64, 64, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 64, 232, 232), 'float32'), ('TENSOR', (64, 64, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 64, 232, 232), 'float32'), ('TENSOR', (256, 64, 1, 1), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 256, 232, 232), 'float32'), ('TENSOR', (128, 256, 1, 1), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 128, 233, 233), 'float32'), ('TENSOR', (128, 128, 3, 3), 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 512, 116, 116), 'float32'), ('TENSOR', (128, 512, 1, 1), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw_winograd.cuda', ('TENSOR', (1, 128, 116, 116), 'float32'), ('TENSOR', (128, 128, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 128, 116, 116), 'float32'), ('TENSOR', (128, 128, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 128, 116, 116), 'float32'), ('TENSOR', (512, 128, 1, 1), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 512, 116, 116), 'float32'), ('TENSOR', (256, 512, 1, 1), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 256, 117, 117), 'float32'), ('TENSOR', (256, 256, 3, 3), 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 1024, 58, 58), 'float32'), ('TENSOR', (256, 1024, 1, 1), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw_winograd.cuda', ('TENSOR', (1, 256, 58, 58), 'float32'), ('TENSOR', (256, 256, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 256, 58, 58), 'float32'), ('TENSOR', (256, 256, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 256, 58, 58), 'float32'), ('TENSOR', (1024, 256, 1, 1), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 1024, 58, 58), 'float32'), ('TENSOR', (512, 1024, 1, 1), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 512, 59, 59), 'float32'), ('TENSOR', (512, 512, 3, 3), 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 2048, 29, 29), 'float32'), ('TENSOR', (512, 2048, 1, 1), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw_winograd.cuda', ('TENSOR', (1, 512, 29, 29), 'float32'), ('TENSOR', (512, 512, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 512, 29, 29), 'float32'), ('TENSOR', (512, 512, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 512, 29, 29), 'float32'), ('TENSOR', (2048, 512, 1, 1), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32')
('conv2d_nchw.cuda', ('TENSOR', (1, 2048, 29, 29), 'float32'), ('TENSOR', (256, 2048, 1, 1), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32')

If you are interested, here is the Tensorflow graph_def I am working on.

lsy643 · May 5, 2020, 7:04am

The stack overflow problem occurs in the function mod, params = bld_mod.optimize(mod, target, params) of tvm\relay\build_module.py file. It works fine in Linux, but it failed in Windows.

I run these labs in a computer with 128G memory, so it’s probably not the memory size that cause the problem

There are some warnings before the stack overflow error

download failed due to URLError(gaierror(11004, 'getaddrinfo failed')), retrying, 2 attempts left

download failed due to URLError(gaierror(11004, 'getaddrinfo failed')), retrying, 1 attempt left

WARNING:root:Failed to download tophub package for cuda: <urlopen error [Errno 11004] getaddrinfo failed>

I am wondering if such warning has anything to do with my problem, and any advice to debug this problem in Windows? Thanks a lot

comaniac · May 5, 2020, 4:29pm

OK then it looks like a problem on Windows. @jonso do you have any experience about similar issues?

jonso · May 5, 2020, 4:37pm

I personally have not been able to use AutoTune on Windows, but I know that @jmorrill has a fork which has some fixes.