YOLOv3 cost so much time

https://docs.tvm.ai/tutorials/frontend/from_darknet.html

I test the darkenet inference. I change the target = llvm to target = cuda, I get the time using the fucntion m.module.time_evaluator(“run”, ctx), I get the result about 424ms On Tesla P40. Why it cost so much time?

Thanks!!