How I Get The Fasta Sequences Of Proteins From A List Of Protien Pdb Id
2
If I have a list of pdb id of protein along with the beginning and end of sequences I am interested in, is there a API from pymol or other place I could get a file listing all the fasta sequence of these proteins (if possible in the region i am interested in?)
Thanks!
pdb
api
fasta
• 6.4k views
•
link
updated 3.0 years ago by
Ram
44k
•
written 11.8 years ago by
heath
▴
20
The following command seems to work:
$ echo -e "3I5F\n2p4k\n2p4m" | \
while read I; do curl -s "http://www.rcsb.org/pdb/rest/customReport?pdbids=${I}&customReportColumns=structureId,chainId,entityId,sequence,db_id,db_name&service=wsdisplay&format=text" | \
xsltproc stylesheet.xsl - ; done | \
fold -w 80
with stylesheet.xsl:
output:
>3I5F|A|1|O44934|O44934
MTMDFSDPDMEFLCLTRQKLMEATSIPFDGKKNCWVPDPDFGFVGAEIQSTKGDEVTVKTDKTQETRVVKKDDIGQRNPP
KFEMNMDMANLTFLNEASILHNLRSRYESGFIYTYSGLFCIAINPYRRLPIYTQGLVDKYRGKRRAEMPPHLFSIADNAY
QYMLQDRENQSMLITGESGAGKTENTKKVIQYFALVAASLAGKKDKKEEEKKKDEKKGTLEDQIVQCNPVLEAYGNAKTT
RNNNSSRFGKFIRIHFGTQGKIAGADIETYLLEKSRVTYQQSAERNYHIFYQLLSPAFPENIEKILAVPDPGLYGFINQG
TLTVDGIDDEEEMGLTDTAFDVLGFTDEEKLSMYKCTGCILHLGEMKWKQRGEQAEADGTAEAEKVAFLLGVNAGDLLKC
LLKPKIKVGTEYVTQGRNKDQVTNSIAALAKSLYDRMFNWLVRRVNQTLDTKAKRQFFIGVLDIAGFEIFDFNSFEQLCI
NYTNERLQQFFNHHMFVLEQEEYKKEGIVWEFIDFGLDLQACIELIEKPMGILSILEEECMFPKASDTSFKNKLYDNHLG
KNPMFGKPKPPKAGCAEAHFCLHHYAGSVSYSIAGWLDKNKDPINENVVELLQNSKEPIVKMLFTPPRILTPGGKKKKGK
SAAFQTISSVHKESLNKLMKNLYSTHPHFVRCIIPNELKTPGLIDAALVLHQLRCNGVLEGIRICRKGFPNRIIYSEFKQ
RYSILAPNAVPSGFADGKVVTDKALSALQLDPNEYRLGNTKVFFKAGVLGMLEDMRDERLSKIISMFQAHIRGYLMRKAY
KKLQDQRIGLTLIQRNVRKWLVLRNWEWWRLFNKVKPLL
>3I5F|B|2|P08052|P08052
AEEAPRRVKLSQRQMQELKEAFTMIDQDRDGFIGMEDLKDMFSSLGRVPPDDELNAMLKECPGQLNFTAFLTLFGEKVSG
TDPEDALRNAFSMFDEDGQGFIPEDYLKDLLENMGDNFSKEEIKNVWKDAPLKNKQFNYNKMVDIKGKAEDED
>3I5F|C|3|P05945|P05945
SQLTKDEIEEVREVFDLFDFWDGRDGDVDAAKVGDLLRCLGMNPTEAQVHQHGGTKKMGEKAYKLEEILPIYEEMSSKDT
GTAADEFMEAFKTFDREGQGLISSAEIRNVLKMLGERITEDQCNDIFTFCDIREDIDGNIKYEDLMKKVMAGPFPDKSD
>2P4K|A|1|P04179|P04179
KHSLPDLPYDYGALEPHINAQIMQLHHSKHHAANVNNLNVTEEKYQEALAKGDVTAQIALQPALKFNGGGHINHSIFWTN
LSPNGGGEPKGELLEAIKRDFGSFDKFKEKLTAASVGVQGSGWGWLGFNKERGHLQIAACPNQDPLQGTTGLIPLLGIDV
WEHAYYLQYKNVRPDYLKAIWNVINWENVTERYMACKK
>2P4K|B|1|P04179|P04179
KHSLPDLPYDYGALEPHINAQIMQLHHSKHHAANVNNLNVTEEKYQEALAKGDVTAQIALQPALKFNGGGHINHSIFWTN
LSPNGGGEPKGELLEAIKRDFGSFDKFKEKLTAASVGVQGSGWGWLGFNKERGHLQIAACPNQDPLQGTTGLIPLLGIDV
WEHAYYLQYKNVRPDYLKAIWNVINWENVTERYMACKK
>2P4K|C|1|P04179|P04179
KHSLPDLPYDYGALEPHINAQIMQLHHSKHHAANVNNLNVTEEKYQEALAKGDVTAQIALQPALKFNGGGHINHSIFWTN
LSPNGGGEPKGELLEAIKRDFGSFDKFKEKLTAASVGVQGSGWGWLGFNKERGHLQIAACPNQDPLQGTTGLIPLLGIDV
WEHAYYLQYKNVRPDYLKAIWNVINWENVTERYMACKK
>2P4K|D|1|P04179|P04179
KHSLPDLPYDYGALEPHINAQIMQLHHSKHHAANVNNLNVTEEKYQEALAKGDVTAQIALQPALKFNGGGHINHSIFWTN
LSPNGGGEPKGELLEAIKRDFGSFDKFKEKLTAASVGVQGSGWGWLGFNKERGHLQIAACPNQDPLQGTTGLIPLLGIDV
WEHAYYLQYKNVRPDYLKAIWNVINWENVTERYMACKK
>2P4M|A|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|B|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|C|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|D|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|E|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|F|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|G|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|H|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
Try pdb-tools - there is a module included pdb_seq.py
Login before adding your answer.
Traffic: 1896 users visited in the last hour
Thanks a lot! ^-^