Question

Get reference full genome sequences for selected organisms

1

Entering edit mode

7.0 years ago

marongiu.luigi ▴ 730

Hello,

I would like to download all the reference sequences, full length for a given organism. I am using esearch as reported on the NCBIwebsitee with the following command:

esearch -db "nucleotide" -query "txidX[Organism] AND refseq[filter]"|efetch -format fasta > genome.fa

where X is the code for a given taxon. This works but I get both 'complete genome' and 'complete sequence' entries.

Is it possible to get only the 'complete genome' entries? Thank you

genome blast • 1.9k views

ADD COMMENT • link updated 7.0 years ago by Joseph Hughes ★ 3.0k • written 7.0 years ago by marongiu.luigi ▴ 730

score 2 · Answer 1 · 2018-01-05

2

Entering edit mode

7.0 years ago

Joseph Hughes ★ 3.0k

This is most likely a result of your particular species having multiple segments or chromosomes. For example:

esearch -db "nucleotide" -query "txid40120[Organism] AND refseq[filter]"|efetch -format fasta > genome.fa

would retrieve 32 complete genomes but

esearch -db "nucleotide" -query "txid4txid40051[Organism] AND refseq[filter]"|efetch -format fasta > genome.fa

would retrieve 10 complete sequences, one for each of the 10 segments of the bluetongue virus.

So the approach to take depends on what you really want to retrieve.

ADD COMMENT • link 7.0 years ago by Joseph Hughes ★ 3.0k

0

Entering edit mode

thank you, but the taxon I am looking for contains both complete genomes and sequences; still is there a way to separate them, either directly with an option of esearch or afterward with the manipulation of the resulting fasta file?

ADD REPLY • link 7.0 years ago by marongiu.luigi ▴ 730