download protein sequences from NCBI
4
1
Entering edit mode
8.2 years ago
guillaume.rbt ★ 1.0k

Hi all,

I would like to download all protein sequences from one species on NCBI:

https://www.ncbi.nlm.nih.gov/protein?linkname=bioproject_protein&from_uid=261773

This is maybe trivial, but is there a way to download all sequences concatenated in only one fasta?

Thanks a lot,

Guillaume

fasta ncbi protein • 6.2k views
ADD COMMENT
1
Entering edit mode

there is a send to option through which you can download all the sequences. After just remove the fasta headers to make a single fasta

awk 'BEGIN{a=0}{if($0~/^>/){if(a==0){print}a++;}else{print}}' input.fasta >out.fasta
ADD REPLY
0
Entering edit mode

thanks for the response, how do you use the send to option? Is this on the console or on the website?

ADD REPLY
4
Entering edit mode
8.2 years ago

using the ncbi interface you can just click on "Send to > File"

or using eutils:

curl "http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=bioproject&id=261773&linkname=bioproject_protein"  | xmllint --xpath '//LinkSetDb' - | xmllint --format - | grep "<Id>" | cut -d '>' -f 2 | cut -d '<' -f 1 | while read L; do curl "http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=${L}&rettype=fasta" ; done
>gi|821074095|gb|KKY28990.1| putative uracil phosphoribosyltransferase [Diplodia seriata]
MFVHASGPESIKFKHLQGQVQVLLVDSVINSGATILDFVEAIREINPGIRIVVVAGTVQAQCISPNNPFY
KTLAQHGDISLVALRSSETKFTGSGGTDTGNRLFNTTHLL

>gi|821074094|gb|KKY28989.1| putative integral membrane protein [Diplodia seriata]
MPQYFPWPYSVDPLPEDLRRGLWPVGIFALMSTVATLALLCWITYRLVSWRKHYRSYVGYNQYVLLIYNL
LLADLQQSISFLISFHWIHTDSMLAPSPACFGQAWLVQIGDISSGMFVLAIALHTFFSVVKGRQIPFRAF
LIGTIVIWALALLLTVLGPALHGSDYFTAAGAWCWASDKYETERLWLHYLWIFIIEFGTVIIYALIFIYL
RKQLVSIASAHQHSTQNKVSQAARYMVLYPLTYVLLTLPLAAGRMATMTGQTLPIAYYCAAGSMMTSCGW
VDAALYALTRRVLVSNEIDQPQGGAGKGASSSGGRTGYGGHGSSHTATGWDIASFSDRKGGMGADHSVTI
TGGLDARGSNFIDMDELSKGGVHHHATERVGRPKHKGSSTPSTQGLTRARSSSTSARESTPRGSTDSILA
GLGGVRAETKVEIRVEPANGFMLPGEGSGSNGSSGMSTPNGRTVEVVGNSHAMRPRSGSPY

ADD COMMENT
5
Entering edit mode
8.2 years ago
tlorin ▴ 370

Here is a well-explained tutorial for your problem :)

ADD COMMENT
0
Entering edit mode

The link you provided doesn't seem to work.

ADD REPLY
0
Entering edit mode

This should be better now, thanks!

ADD REPLY
0
Entering edit mode

thank you all for your help! works fine

ADD REPLY
2
Entering edit mode
8.2 years ago
Sej Modha 5.3k

NCBI Unix e-utils version of the @pierre's solution

esearch -db bioproject -query 261773|elink -target protein |efetch -format fasta
ADD COMMENT
0
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 2768 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6