[Auto-tuning][CUDA][MXnet]Inference result is wrong after auto tuning mxnet model on NVIDIA 1080ti GPU

lucien · November 8, 2019, 10:11am

Hi all,

I have a question to consult. I found the inference result of my model is wrong after TVM auto-tuning.

My model used transpose_a in _mx_batch_dot which is not supported in tvm/relay/frontend/mxnet.py So I changed the operator in mxnet.py

Here is the change I did

def _mx_batch_dot(inputs, attrs):
assert len(inputs) == 2
a, b = inputs
transpose_a = attrs.get_bool("transpose_a", False)
transpose_b = attrs.get_bool("transpose_b", False)
if transpose_a is True:
   #msg = 'Value {} in attribute "transpose_a" of operator batch_dot ' \
   #     'is not valid.'
   #raise tvm.error.OpAttributeInvalid(msg.format(transpose_a))
			a = _op.transpose(a, axes=[0, 2, 1])           #Lucien add

if transpose_b is False:
    b = _op.transpose(b, axes=[0, 2, 1])
return _op.nn.batch_matmul(a, b)

Will this change effect auto tuning so the inference result is wrong?

Regards

Lucien

lucien · November 8, 2019, 10:13am

In addition, before auto tuning, the inference result is correct.

lucien · November 11, 2019, 9:44am

This issue is not related to transpose_a operation. I fixed this issue and got correct result after modifying my tuning script and mark some unused statements

if name == 'arcface':
    # an example for mxnet model
    print("use arcface mxnet model")
    mod, params = relay.frontend.from_mxnet(mx_sym, shape_dict, arg_params=arg_params, aux_params=aux_params)
    # net = mod["main"]                         //Lucien modified 
    # net = relay.Function(net.params, relay.nn.softmax(net.body), None, net.type_params, net.attrs)                     //Lucien modified
    # mod = relay.Module.from_expr(net)    //Lucien modified

comaniac · November 11, 2019, 5:30pm

So what you meant was the issue has gone after removing the softmax layer at the end of the model?