Refseq proteins for several taxids
1
0
Entering edit mode
9.9 years ago
seta ★ 1.9k

Hi all,

My question may sound simple. I'm trying to download the plant ref-seq proteins from NCBI to make blast database and run blastx for contigs resulted from de novo assembly of a non-model plant. As there is several taxonomy ID for plants, like flowering plants (3398), green plants (33090), ...please be ware me how I can get all plant ref-seq protein sequence to have as rich as database? Please don't refer me to ftp://ftp.ncbi.nlm.nih.gov/refseq/release/plant/ as it contains mixed refseq sequences, not just protein refseq. Thanks in advance.

next-gen blast RNA-Seq • 3.5k views
ADD COMMENT
2
Entering edit mode
9.9 years ago
Siva ★ 1.9k

You can download only the protein sequences from the FTP URL you listed using curl.

curl -o plant.#1.protein.faa.gz ftp://ftp.ncbi.nlm.nih.gov/refseq/release/plant/plant.\[1-87\].protein.faa.gz
ADD COMMENT
0
Entering edit mode

Thanks a lot friend. Is there similar command to get the plant protein sequences from Uniprot?

ADD REPLY
0
Entering edit mode

The following query will retrieve all sequences with keyword "Complete Proteome" from the taxonomy group "Viridiplantae".

http://www.uniprot.org/uniprot/?query=taxonomy%3A%22Viridiplantae+[33090]%22+keyword%3A%22Complete+proteome+[KW-0181]%22
ADD REPLY

Login before adding your answer.

Traffic: 2294 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6