Question about fusing the CPU and GPU operators

I have written a INT4 GPU Conv2d and the single operator can run correctly, Then I tried to run the resnet50 with this newly developed Conv2d, but it seems that when running the model, some other quantization operators are fused into Conv2d, like “fused_nn_conv2d_add_right_shift_clip_cast_17”, and the running failed with the error

"ValueError: Direct host side access to device memory is detected. Did you forget to bind?"

The generated code also has some problems:

  # Output of GPU Conv2d
  produce Out {
    // attr [iter_var(blockIdx.z, , blockIdx.z)] thread_extent = 784
    // attr [iter_var(blockIdx.x, , blockIdx.x)] thread_extent = 1
    // attr [iter_var(blockIdx.y, , blockIdx.y)] thread_extent = 16
    for (n.inner.o.inner.fused.outer.outer.outer.outer, 0, 2) {
      // attr [iter_var(threadIdx.z, , threadIdx.z)] thread_extent = 1
      // attr [iter_var(threadIdx.y, , threadIdx.y)] thread_extent = 1
      // attr [iter_var(threadIdx.x, , threadIdx.x)] thread_extent = 32
      Out[(((((n.inner.o.inner.fused.outer.outer.outer.outer*401408) + (floordiv(threadIdx.x, 8)*100352)) + (blockIdx.y*6272)) + (floormod(threadIdx.x, 8)*784)) + blockIdx.z)] = Conv[((((blockIdx.z*1024) + (blockIdx.y*64)) + (n.inner.o.inner.fused.outer.outer.outer.outer*32)) + threadIdx.x)]
    }
  }

  # Cast on CPU accesses "Out" on GPU
  produce T_cast {
    for (ax0, 0, 8) {
      for (ax1, 0, 128) {
        for (ax2, 0, 28) {
          for (ax3, 0, 28) {
            T_cast[((((ax0*100352) + (ax1*784)) + (ax2*28)) + ax3)] = int4(max(min(shift_right((Out[((((ax0*100352) + (ax1*784)) + (ax2*28)) + ax3)] + 16), 5), 7), -7))
          }
        }
      }
    }
  }
}

In produce T_cast, it accesses a GPU memory “Out” in CPU code, so I guess the problem is that the fusion operation fused some CPU operators into GPU operator and the CPU operator accesses the output of the GPU operator directly. I have no idea what to do about it, would be appreciated it if anyone could take a look and give me some suggestions