Signed_integer_overflow when batch size is large

When I run an example, it throws an error
codegen/llvm/codegen_llvm.cc:692: unknown intrinsic signed_integer_overflow
when the batch size is large. The overflow occurs during simplification.
Is this expected behavior or do we need to handle overflow here?

How to reproduce:
Set batch size = 128 in https://github.com/dmlc/tvm/blob/be77cf1963292b018cdf241c595955ab4b3b5f44/tutorials/autotvm/tune_nnvm_cuda.py#L209
and then run this tutorial.

cc @tqchen

I tried batch_size = 4 in the tutorial example, and there seemed to be no problem.

For inference, 128 is probably too big with very high latency.