NNVM reduction is broken


#1

Hi. It seems that recent commit breaks reduction in NNVM. (I’m now at current origin/master bde53033b). For example, my small argmax test program issues the below error. Please, consider fixing! What is the status of automatic tests currently?

TVMError                                  Traceback (most recent call last)
<ipython-input-4-74ed2e3f3a21> in <module>()
----> 1 test_argmax()

~/src/mironov/argmax/test_argmax.py in test_argmax()
     36         print('numpy', np.argmax(data+1, axis=0))
     37         m.run(x=data)
---> 38         out = m.get_output(0, tvm.nd.empty(shape=oshape, dtype='int32'))
     39         print('out',out)
     40         return out

~/tvm/python/tvm/contrib/graph_runtime.py in get_output(self, index, out)
    176         """
    177         if out:
--> 178             self._get_output(index, out)
    179             return out
    180

~/tvm/python/tvm/_ffi/_ctypes/function.py in __call__(self, *args)
    183         check_call(_LIB.TVMFuncCall(
    184             self.handle, values, tcodes, ctypes.c_int(num_args),
--> 185             ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
    186         _ = temp_args
    187         _ = args

~/tvm/python/tvm/_ffi/base.py in check_call(ret)
     64     """
     65     if ret != 0:
---> 66         raise TVMError(py_str(_LIB.TVMGetLastError()))
     67
     68

TVMError: [12:54:53] /workspace/tvm/src/runtime/graph/graph_runtime.cc:153: Check failed: data->shape[j] == data_out->shape[j] (1 vs. 4)

#2

Your argmax is on the wrong axis, it should be 1 instead of 0.


#3

Your argmax is on the wrong axis, it should be 1 instead of 0

Error remains the same if I change 0 to 1. But it goes away if I checkout 543c42403a^. I think it may be because this commit sets TOPI_REDUCE_ATLEAST1D to 0 (a new behavior regarding scalar shape) for topi.cc, but keeps an old behavior for nnvm/reduce.cc. I am not sure, but I would say the conflicts are likely.


#4

I also experience this bug when importing both topi and nnvm. It actually depends on the order you import the two:

# commenting this line (or moving it below importing nnvm) will change the behaviour 
import topi                                                                          
                                                                                     
import nnvm                                                                          
import nnvm.symbol as sym                                                            
import nnvm.compiler                                                                 
                                                                                     
y = sym.Variable('y', shape=[1,5,2], dtype=0)                                        
z = sym.sum(y)                                                                       
graph = nnvm.graph.create(z)                                                         
nnvm.compiler.build(graph, target='llvm')                                            

This results in Check failed: out[i].ndim() == out_info[i].ndim() (0 vs. 1) sum if topi is imported before nnvm.
For some reason this bug doesn’t manifest itself on CI servers.


#5

see https://github.com/dmlc/tvm/pull/2147