download protein sequences from NCBI
4
using the ncbi interface you can just click on "Send to > File"
or using eutils:
curl "http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=bioproject&id=261773&linkname=bioproject_protein" | xmllint --xpath '//LinkSetDb' - | xmllint --format - | grep "<Id>" | cut -d '>' -f 2 | cut -d '<' -f 1 | while read L; do curl "http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=${L}&rettype=fasta" ; done
>gi|821074095|gb|KKY28990.1| putative uracil phosphoribosyltransferase [Diplodia seriata]
MFVHASGPESIKFKHLQGQVQVLLVDSVINSGATILDFVEAIREINPGIRIVVVAGTVQAQCISPNNPFY
KTLAQHGDISLVALRSSETKFTGSGGTDTGNRLFNTTHLL
>gi|821074094|gb|KKY28989.1| putative integral membrane protein [Diplodia seriata]
MPQYFPWPYSVDPLPEDLRRGLWPVGIFALMSTVATLALLCWITYRLVSWRKHYRSYVGYNQYVLLIYNL
LLADLQQSISFLISFHWIHTDSMLAPSPACFGQAWLVQIGDISSGMFVLAIALHTFFSVVKGRQIPFRAF
LIGTIVIWALALLLTVLGPALHGSDYFTAAGAWCWASDKYETERLWLHYLWIFIIEFGTVIIYALIFIYL
RKQLVSIASAHQHSTQNKVSQAARYMVLYPLTYVLLTLPLAAGRMATMTGQTLPIAYYCAAGSMMTSCGW
VDAALYALTRRVLVSNEIDQPQGGAGKGASSSGGRTGYGGHGSSHTATGWDIASFSDRKGGMGADHSVTI
TGGLDARGSNFIDMDELSKGGVHHHATERVGRPKHKGSSTPSTQGLTRARSSSTSARESTPRGSTDSILA
GLGGVRAETKVEIRVEPANGFMLPGEGSGSNGSSGMSTPNGRTVEVVGNSHAMRPRSGSPY
Here is a well-explained tutorial for your problem :)
NCBI Unix e-utils version of the @pierre's solution
esearch -db bioproject -query 261773|elink -target protein |efetch -format fasta
Another solution(s) implemented as a Python script:
Kevin
Login before adding your answer.
Traffic: 1591 users visited in the last hour
there is a
send to
option through which you can download all the sequences. After just remove the fasta headers to make a single fastathanks for the response, how do you use the send to option? Is this on the console or on the website?