[Issue of Auto Tuning] mali gpu auto tuning is not working properly

Hello!

I am currently trying to optimize a network running on rk3399 on a mali gpu and use current version of TVM( 0.7dev1 ). I followed the simple tutorial provided by the auto tuning tutorial and I am having problems.

TVM does not seem to generate proper OpenCL code when generate opencl code for mali. If you look at the log below, it seems that a problem occurs because an invalid identifier(’\n’) is inserted in the current opencl code.

TVMError: OpenCL build error for device=0x7f8b2d07f0:1:130: error: expected ‘)’\n__kernel void default_function_kernel1(__global float* restrict conv, __global float* restrict data_vec, __global void* restrict kernel) {\n ^\n\n:1:39: note: to match this ‘(’\n__kernel void default_function_kernel1(__global float* restrict conv, __global float* restrict data_vec, __global void* restrict kernel) {\n ^\n\n:1:130: error: parameter name omitted\n__kernel void default_function_kernel1(__global float* restrict conv, __global float* restrict data_vec, __global void* restrict kernel) {\n ^\n\n:7:401: error: expected expression\n conv[(((((((int)get_group_id(2)) * 3136) + (((int)get_local_id(2)) * 196)) + (((int)get_group_id(1)) * 28)) + (((int)get_group_id(0)) * 4)))] = (conv[(((((((int)get_group_id(2)) * 3136) + (((int)get_local_id(2)) * 196)) + (((int)get_group_id(1)) * 28)) + (((int)get_group_id(0)) * 4)))] + (data_vec[((((((int)get_group_id(1)) * 1792) + (((int)get_group_id(0)) * 256)) + ci))] * ((__global float*)kernel)[((((((int)get_group_id(2)) * 16384) + (((int)get_local_id(2)) * 1024)) + (ci * 4)))]));\n ^\n\n:8:413: error: expected expression\n conv[((((((((int)get_group_id(2)) * 3136) + (((int)get_local_id(2)) * 196)) + (((int)get_group_id(1)) * 28)) + (((int)get_group_id(0)) * 4)) + 1))] = (conv[((((((((int)get_group_id(2)) * 3136) + (((int)get_local_id(2)) * 196)) + (((int)get_group_id(1)) * 28)) + (((int)get_group_id(0)) * 4)) + 1))] + (data_vec[((((((int)get_group_id(1)) * 1792) + (((int)get_group_id(0)) * 256)) + ci))] * ((__global float*)kernel)[(((((((int)get_group_id(2)) * 16384) + (((int)get_local_id(2)) * 1024)) + (ci * 4)) + 1))]));\n

Looking at the above log, it seems that the OpenCL code is not working properly because “\n” is continuously inserted.

I think the problem is in the part of generating opencl code in TVM rather than in the device. Could you please provide a solution for that part?