- wget ftp://ftp.wwpdb.org/pub/pdb/derived_data/pdb_seqres.txt
- grep -e "mol:protein" -A 1 pdb_seqres.txt | sed '/^--$/d' | grep "^>" | cut -d' ' -f3 | cut -d':' -f2 | sort -n | head -n 1
- grep -e "mol:protein" -A 1 pdb_seqres.txt | sed '/^--$/d' | grep "^>" | cut -d' ' -f3 | cut -d':' -f2 | sort -n | tail -n 1
- 抓所有PDB的序列(fasta format)
- 最短的序列,長度是2
- 最長的序列,長度是5037
知道長度以後,用下面指令就知道最長和最短的序列是哪些PDB
- grep "mol:protein" pdb_seqres.txt | grep "length:2 "
>1ahg_D mol:protein length:2 PHOSPHO-5'-PYRIDOXYL TYROSINE
>1lgc_H mol:protein length:2 DIPEPTIDE
>1lgc_I mol:protein length:2 DIPEPTIDE
>1lgc_J mol:protein length:2 DIPEPTIDE
>4d2c_D mol:protein length:2 L-ALANINE-L-PHENYLALANINE
>4m6g_B mol:protein length:2 L-alanine-iso-D-glutamine
- grep "mol:protein" pdb_seqres.txt | grep "length:5037"
>3j8e_C mol:protein length:5037 Ryanodine receptor 1
>3j8e_E mol:protein length:5037 Ryanodine receptor 1
>3j8e_H mol:protein length:5037 Ryanodine receptor 1
>4uwa_A mol:protein length:5037 RYANODINE RECEPTOR 1
>4uwa_B mol:protein length:5037 RYANODINE RECEPTOR 1
>4uwa_C mol:protein length:5037 RYANODINE RECEPTOR 1
>4uwa_D mol:protein length:5037 RYANODINE RECEPTOR 1
>4uwe_A mol:protein length:5037 RYANODINE RECEPTOR 1
>4uwe_B mol:protein length:5037 RYANODINE RECEPTOR 1
>4uwe_C mol:protein length:5037 RYANODINE RECEPTOR 1
>4uwe_D mol:protein length:5037 RYANODINE RECEPTOR 1
_EOF_
沒有留言:
張貼留言