Why auto tune always 0.0GFLPS?

I’m auto-tuing Resnet50 on Tesla V100 GPU. It’s always like this:

[Task 1/23] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (160/2000) | 209.03 s

I opened the tmp log and copy one line. Seems it’s tuning conv2d and the input is 1X1024X14X14, and kernel shape is 2048X1024X1X1. This workload is not a heavy one, why it’s always 0.0 GFLOPS?

{“i”: [“cuda”, “topi_nn_conv2d”, [[“TENSOR”, [1, 1024, 14, 14], “float32”], [“TENSOR”, [2048, 1024, 1, 1], “float32”], [2, 2], [0, 0], [1, 1], “NCHW”, “float32”], {}, [“conv2d”, [1, 1024, 14, 14, “float32”], [2048, 1024, 1, 1, “float32”], [2, 2], [0, 0], [1, 1], “NCHW”, “float32”], {“i”: 111087, “t”: “direct”, “c”: null, “e”: [[“tile_f”, “sp”, [-1, 16, 128, 1]], [“tile_y”, “sp”, [-1, 7, 1, 1]], [“tile_x”, “sp”, [-1, 1, 1, 1]], [“tile_rc”, “sp”, [-1, 256]], [“tile_ry”, “sp”, [-1, 1]], [“tile_rx”, “sp”, [-1, 1]], [“auto_unroll_max_step”, “ot”, 512], [“unroll_explicit”, “ot”, 0]]}], “r”: [[1000000000.0], 1, 0.05721116065979004, 1578299411.1211495], “v”: 0.1}

When it says 0 GFLOPs it means that the schedule failed to run, probably due to an error. If you’re consistently seeing all 0s on all your tasks you probably have something set up incorrectly.

This result indicates that the error code is 1, which means “INSTANIATION_ERROR”. Based on my experience this error is caused by unfit config, but it should not frequently happen for conv2d on V100. You may need to check other configs in the tuning log to see what’s the most common error code you got. You can find the meaning of error code here.

You mean the integer after the [1000000000.0] is the error code? I copied the first few lines of the resnet.log.tmp, you can see the first measurement returns 4 and then 1 …

If this code is not 0, then this measurement just failed?

I also checked my mobilenet tuning log, this column is mostly non-zero.

here is the first few lines of the mobilenet.log.tmp, seems it just succeeds a few times.

So most of my measurement is unavailable? If that’s the fact, it seems to count for the fact that no matter how long I tune, how large I set the n_trial ,the tuned perf never gets higher.

How can I locate this error and fix it?

I posted my tune option before, seems it’s ok? where can I check after this? Hope for your reply !

This is the error code I copied from the class:

class MeasureErrorNo(object):
    """Error type for MeasureResult"""
    NO_ERROR = 0              # no error
    INSTANTIATION_ERROR = 1   # actively detected error in instantiating a template with a config
    COMPILE_HOST = 2          # error when compiling code on host (e.g. tvm.build)
    COMPILE_DEVICE = 3        # error when compiling code on device (e.g. OpenCL JIT on the device)
    RUNTIME_DEVICE = 4        # error when run program on device
    WRONG_ANSWER = 5          # answer is wrong when compared to a golden output
    BUILD_TIMEOUT = 6         # timeout during compilation
    RUN_TIMEOUT = 7           # timeout during run
    UNKNOWN_ERROR = 8         # unknown error

So error 4 is a runtime problem, which may be caused by device issues (e.g., unavailable device, out of device memory, etc). 6 is build timeout (default 10s). If you want to dive into the root cause, you could isolate one config and see the detail errors (it’s not an easy task tho).

Thank you. I found this file and the error codes. Just to confirm I did’t misunderstand your words.

If one measurement returns a non-zero error code , then the current configuration is just skipped, is that the case? So a successful tuning process should contain no error? Or a few cases with error is just fine?

Yes the config with non-zero error code will be skipped when looking for the best config in the tuning log. However, it’s normal to have some non-zero configs. When a config does not fit your device well, it may get error code 1 during the compilation time (usually failed to pass memory usage analysis), or get error code 4 during the measurement (improper thread number, memory size, etc).

So how can I judge whether the auto-tuning process is running normally?

I looked up the tmp file. It does have successful measurements, though not many, about 20%. But the GFLOPS is still 0.0 .

Is that ok? I don’t know whether to ignore the failed measurements or not. How to judge whether it’s because the measured configuration is illegal itself or something is wrong with my system environment or tune option settings.

Any experience or advice ?

AutoTVM considers >150 failures as an abnormal situation. Although I never check how reasonable this assumption is, I barely get >150 failures in several thousands of trials. I would say if you are tuning a common model (i.e., ResNet 50) on a common device (V100), then you shouldn’t get that many errors. Your setup or system environment may have problems.

As long as you have one or more config with error code 0, you should have at least one working config with >0 GFLOPs. One possible reason that you see 0.0 GFLOPs is that the throughput is less than 0.1 GFLOPs so it will show “0.0” in %.1f precision. Anyway, I suggest you taking a few failed configs to make a depth analysis.

OK , got it. Thank you !

Here is a snapshot of the mobilenet.log.tmp, I’m confused why the format is changing?

The ‘r’ column is at the beginning of a line at first , but after a few lines , the “r” column moved , the first column becomes “v”.

Seems this problem is solved , it’s due to my llvm version.

I changed llvm from 8.0.0 to 6.0.0, and rebuilt tvm.

Now the resnet auto tuning is running and GFLOPS is not 0.0 now. The GPU-Util is up to 100% . Thank you!

Nice. Now my trouble shooting guideline could have one more bullet :slight_smile: btw, could you change the title by adding [SOLVED] in the beginning to indicate this issue has been resolved? Thanks.

Seems I don’t have access to edit this post now ?

I can see edit button in my most recent reply. Is there a time limit to this ?

Thank you for your answer, but I want to know how auto tune judge whether a config is fit your device well. Because I also encountered a error code 6 with compilation timeout, but if I skip the tune process and take the error config as input to build(with autotvm.apply_history_best(‘error config’), it’s also have a good performance result. (I also tried to increase the value of timeout, but it‘s seem not work). So do you know why? Thanks