[TOPI] [CUDA] Not in feed graph consumer = extern(argsort_gpu, 0xd3c1490) #3882

I want to use the function argsort_gpu in topi/cuda/sort.py to implement my nms operator.
but I got the log :
/tvm/src/schedule/bound.cc:129: not in feed graph consumer = extern(argsort_gpu, 0xd3c1490)
The input tensor is scores : Tensor(shape=[1, 80, 15130], op.name=argsort_gpu)
I used by call:
sorted_score = argsort_gpu(scores, axis=2, is_ascend=False, dtype="int32")
got the argsort result but it is very slow, the argsort cast about 7000ms. Slowly than the np.argsort cpu,
Can someone give me some solutions or suggestion?

1 Like