Unofficial autoTVM on Windows Guide

At work I only had limited access to capable Linux boxes with a GPU, but was really impressed reading about this project and the benchmarks. I wanted to the kick the tvm tires on my 16 core, NVidia 1080 (pretty slow compared to what all these guys use), but it was Windows and autotvm isn’t supported there.

I put in the time and have enough implemented to auto tune models we have on my Windows box. I hope it would be of use to someone else in a similar position as me and possibly in the future serve as a proof-of-concept for someone more knowledgeable to make official support. I do question this even as an importance as I don’t know how many Windows users are interested, and it could all be for nothing if Windows eventually gives GPU support in WSL (Won’t be in 2.0, but they have signaled they are interested)

I have uploaded a hodge-podge guide on google docs here. In no way is it a bullet proof, step by step guide. But if you are familiar with installing tvm from source on Linux, it hopefully is enough to get started on Windows. At the very least it’s some reference for myself that I am sharing :slight_smile:

4 Likes

Great Job! Could you describe more details about USE_OPENMP ? What problem when we use TVM thread pool?

I am thinking about whether we could have one section named as “Resources”, which could contain community members docs, like this , like @mli’s Dive into Deep Learning Compiler. @tqchen

It’s been a few weeks but IIRC, it was a python threading.Thread.start() in check_remote(...) (measure_methods.py) was dead locking. The thread target would never run and it would never exit thread.start().

I remember using Process Explorer to investigate the thread stacks and found it was a Python thread being blocked deep inside the CPython runtime. TVM_NUM_THREADS=1 fixed it, but using openmp, I didn’t need to set it.

@jmorrill I am seeing an issue when trying to auto-tune on Windows:

AttributeError: Schedule object has no attributed code_hash

It comes from this block of task.py:

ctx = ApplyConfig(ret.config_space)
    with ctx:
        with target:
            sch, _ = func(*args)
            ret.config_space.code_hash = getattr(sch, 'code_hash', None)

It seems that getattr shouldn’t be throwing here, and the default value should be used. Have you seen this before?

Are you using my custom branch?

That line throws on Windows, which I catch in my modification here: https://github.com/jmorrill/tvm/blob/9846d2c0d2480c77a5a2691fe4122757e0f248ff/python/tvm/autotvm/task/task.py#L196

Got it, I didn’t realize you had already fixed this. Is there any concern about committing this back to the main repo? Or fixing the root cause (Oct 2019 commit)?

That code is part of autotvm, which of course isn’t officially supported in Windows, so I didn’t want to bug the reviewers and make a PR to fix something that generally doesn’t work anyways. :slight_smile:

Would you mind sending out a PR with your changes to make it work on Windows?

I’m happy to review it - I make an effort to mention Windows support in all of the changes I review. I think we should formalize the effort to make AutoTVM work on Windows :slight_smile:

I think it would be great to formalize Windows support and would be happy to work on that, but my gut feeling is not optimistic on the buy-in from the project owners…and probably to shy to ask.

One reason is the maintenance/testing on supporting Autotvm in Windows increases quite a bit. In my branch, I’ve been careful to preserve behavior of posix platforms, but I had to add a lot of 'if os.name == ‘nt’ in there. Future changes the project owners may want to make could be encumbered by having to support Windows.

I had to do a lot of little hacks to make Windows run close to Linux speed. Most of it because there is no fork (), threading in Python is poor, and multithread.Process is very slow to spawn…so I had to cache processes and process pools.

I recently added Windows support to the C++ RPC server, which was a big perf win and less hacky compared to what I had to do with the Python RPC server.

I think if more stuff can be pushed out of Python and into C++, the more chance of having a good Windows implementation…specially in the local_executor and xgboost_cost_model.

I totally understand what you’re saying.

How about this: would you mind sending a PR off of your fork? On the PR, we can have more detailed discussion on the specific code design. I really think it would be worthwhile to start committing these changes back. I think the other reviewers / committers would be happy to see improved Windows support.

Sounds great. Send me a link to your repo. I only have two PRs under my belt (over my whole life) so I’m still a bit green with some git features. =)

May not be able to get to my computer tonight but will be on tomorrow.

You can send a PR on the public TVM repo! Then we can get more input. And no problem on timing :slight_smile:

I still cannot get it to work. Would you like to provide a working example?

What are you stuck on?

Too many error, and since the code dose not print to error properly, I format it a little bit

Traceback (most recent call last):
  File "C:\Program Files\Python37\lib\site-packages\tvm-0.6.0-py3.7-win-amd64.egg\tvm\contrib\cc.py", line 185, in _windows_shared
    link_cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
  File "C:\Program Files\Python37\lib\subprocess.py", line 800, in __init__
    restore_signals, start_new_session)
  File "C:\Program Files\Python37\lib\subprocess.py", line 1207, in _execute_child
    startupinfo)
  File "C:\Program Files\Python37\lib\site-packages\tvm-0.6.0-py3.7-win-amd64.egg\tvm\_ffi\_ctypes\function.py", line 72, in cfun
    rv = local_pyfunc(*pyargs)
  File "C:\Program Files\Python37\lib\site-packages\tvm-0.6.0-py3.7-win-amd64.egg\tvm\rpc\server.py", line 84, in load_module
    m = _load_module(path)
  File "C:\Program Files\Python37\lib\site-packages\tvm-0.6.0-py3.7-win-amd64.egg\tvm\module.py", line 266, in load
    _cc.create_shared(path + ".so", files)
  File "C:\Program Files\Python37\lib\site-packages\tvm-0.6.0-py3.7-win-amd64.egg\tvm\contrib\cc.py", lin
raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel'))
Traceback (most recent call last):
  File "C:\Program Files\Python37\lib\site-packages\tvm-0.6.0-py3.7-win-amd64.egg\tvm\_ffi\_ctypes\function.py", line 72, in cfun
    rv = local_pyfunc(*pyargs)
  File "C:\Program Files\Python37\lib\site-packages\tvm-0.6.0-py3.7-win-amd64.egg\tvm\autotvm\measure\measure_methods.py", line 621, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

I tried to auto tune on cuDNN, with the code from Auto-tuning a convolutional network for NVIDIA GPU

Did you build with -DUSE_CUDA=ON?

Are you executing under an “x64 Native Tools Command Prompt”?

Yes, I run the code with VS2019 x64 native Tool cmd.

And TVM was built with CUDA and cuDNN.

Can you download and install llvm bins from http://releases.llvm.org/download.html

Make sure add to bin dir to system path is checked in the installer UI.

Next restart the x64 tools command prompt (so it picks up the new PATH var)

Seems to me lld-link.exe may not be not installed, which is around where your stack Trace is showing an issue.

Did a draft PR for you @jonso

For the code_hash thing, here is the fix. xD