没找到pandas直接给重复数据排序,自己写了个库:
hash start
0.0944162694555
one+25% 2.67954554452
...
...
one+25% 71.079399709
hash done
74.1000116805
sort处理1w*27(实际处理7列)
sort start
0.651374954121
one+25% 3.27572733257
one+25% 5.80004310553
...
...
one+25% 73.1348870874
sort finish
73.4082792926
sort处理4.7w*27(实际处理7列)
preproc 47000
sort start
0.118014838665
one+25% 32.635356019
one+25% 63.6570617567
...
...
one+25% 885.678654599
sort finish
887.011638494
8w
preproc 80000
sort start
1.46617461846e-06
one+25% 106.000368499
one+25% 211.681905597