[gotvm] Outcome inconsistent when under stress test with AB

Hi, guys,
I tried the inception v3 models with gotvm for stress test with ab, the outcome is as follows:

[GIN] 2019/03/28 - 08:20:17 | 201 |  531.624241ms |     10.13.16.24 | POST     /api/v1/image_classify/
CvLog      2019-03-28 08:20:17|[{"Result":"Map:1.3792035E-01;","name":"image_classify","time":514}]
[GIN] 2019/03/28 - 08:20:17 | 201 |   521.56344ms |     10.13.16.24 | POST     /api/v1/image_classify/
CvLog      2019-03-28 08:20:17|[{"Result":"Map:1.3792035E-01;","name":"image_classify","time":514}]
[GIN] 2019/03/28 - 08:20:17 | 201 |  521.907464ms |     10.13.16.24 | POST     /api/v1/image_classify/
CvLog      2019-03-28 08:20:17|[{"Result":"Soldier:1E+00;","name":"image_classify","time":544}]
[GIN] 2019/03/28 - 08:20:17 | 201 |  551.764087ms |     10.13.16.24 | POST     /api/v1/image_classify/
CvLog      2019-03-28 08:20:17|[{"Result":"Map:1.3792035E-01;","name":"image_classify","time":525}]
[GIN] 2019/03/28 - 08:20:17 | 201 |  532.592126ms |     10.13.16.24 | POST     /api/v1/image_classify/
CvLog      2019-03-28 08:20:17|[{"Result":"Map:1.3792035E-01;","name":"image_classify","time":526}]
[GIN] 2019/03/28 - 08:20:17 | 201 |  533.228027ms |     10.13.16.24 | POST     /api/v1/image_classify/
CvLog      2019-03-28 08:20:17|[{"Result":"Map:1.3792035E-01;","name":"image_classify","time":525}]
[GIN] 2019/03/28 - 08:20:17 | 201 |  532.567306ms |     10.13.16.24 | POST     /api/v1/image_classify/
CvLog      2019-03-28 08:20:17|[{"Result":"Map:1.3792035E-01;","name":"image_classify","time":525}]
[GIN] 2019/03/28 - 08:20:17 | 201 |   532.20224ms |     10.13.16.24 | POST     /api/v1/image_classify/
CvLog      2019-03-28 08:20:17|[{"Result":"Map:1.3792035E-01;","name":"image_classify","time":525}]
[GIN] 2019/03/28 - 08:20:17 | 201 |  532.341632ms |     10.13.16.24 | POST     /api/v1/image_classify/
CvLog      2019-03-28 08:20:17|[{"Result":"Culture:1E+00;","name":"image_classify","time":519}]
[GIN] 2019/03/28 - 08:20:17 | 201 |  526.499516ms |     10.13.16.24 | POST     /api/v1/image_classify/
CvLog      2019-03-28 08:20:17|[{"Result":"Map:1.3792035E-01;","name":"image_classify","time":518}]
[GIN] 2019/03/28 - 08:20:17 | 201 |  525.605048ms |     10.13.16.24 | POST     /api/v1/image_classify/
CvLog      2019-03-28 08:20:17|[{"Result":"Map:1.3792035E-01;","name":"image_classify","time":509}]
[GIN] 2019/03/28 - 08:20:17 | 201 |  516.395864ms |     10.13.16.24 | POST     /api/v1/image_classify/
CvLog      2019-03-28 08:20:17|[{"Result":"Map:1.3792035E-01;","name":"image_classify","time":529}]
[GIN] 2019/03/28 - 08:20:17 | 201 |  535.433843ms |     10.13.16.24 | POST     /api/v1/image_classify/
CvLog      2019-03-28 08:20:17|[{"Result":"Map:1.3792035E-01;","name":"image_classify","time":518}]
[GIN] 2019/03/28 - 08:20:17 | 201 |   526.13679ms |     10.13.16.24 | POST     /api/v1/image_classify/
CvLog      2019-03-28 08:20:17|[{"Result":"Map:1.3792035E-01;","name":"image_classify","time":518}]

It’s under 96 concurrent level stress test, so the inference time is huge, but the total qps is above 90 for a 300-400k image input, which is quite good for batch_size=1.
However, the outcome accidentally labels wrongly with absurd high-probability, can anyone figure why this happen?

BTW, I also test the inception v3 version of tf serving for batch inference with ab, the qps is around 160 with 48 cocurrent level, and if u further enlarge the batch size, the qps can reach above 200.

Anyone can help?

I reckon there might be something about thread-unsafe issue out there, though i had already added the mutex lock.