指令如下:
- cat pdb_seqres.txt | grep -e "mol:protein" -A 1 \
| sed '/^--$/d' | grep -v "^>" \
| awk '{while(length($0)>=5){print substr($0,1,5);gsub(/^./,"")}}' \
| grep X -v \
> prot_5peptide.count - time sort --parallel=1 prot_5peptide.count > prot_5peptide.count.sort1
- time sort --parallel=2 prot_5peptide.count > prot_5peptide.count.sort2
- time sort --parallel=4 prot_5peptide.count > prot_5peptide.count.sort4
- time sort --parallel=6 prot_5peptide.count > prot_5peptide.count.sort6
- time sort --parallel=8 prot_5peptide.count > prot_5peptide.count.sort8
- time sort --parallel=12 prot_5peptide.count > prot_5peptide.count.sort12
- time sort --parallel=16 prot_5peptide.count > prot_5peptide.count.sort16
- time sort --parallel=24 prot_5peptide.count > prot_5peptide.count.sort24
- time sort --parallel=32 prot_5peptide.count > prot_5peptide.count.sort32
整體的速度大約到八核心以後就不太下降,也就是排序一億筆資料八核心約莫花120秒左右、但增加到16、24、32核心整體時間也約莫就只少20秒~
參考資料
- How to sort big files?
- unix sort algorithm implementation 這邊說sort的演算法是merge sort
- Wikipedia上的《Merge sort》
- Youtube影片《15 Sorting Algorithms in 6 Minutes》
- Wikepedia對《Sorting algorithm》的比較表
沒有留言:
張貼留言