inputNdArray.copyFrom' s cpu usage is high up to 7% in android arm device, how to optimize

I fount that these two line occupied 6% cpu in android arm cpu for 3x112x112. It seem a little bit too high for mobile device.

NDArray inputNdArray = NDArray.empty(new long[]{1, IMG_CHANNEL, MODEL_INPUT_SIZE, MODEL_INPUT_SIZE}, new TVMType("float32"));;

I found that it finally call memcpy to do image copy, is there any method to optimize the high cpu usage of the two line?

void VTAMemCopyFromHost(void* dst, const void* src, size_t size) {
  memcpy(dst, src, size);