Efetch For Fully Sequenced Microbial Genomes?
2
1
Entering edit mode
12.8 years ago

Anyone know a way to use efetch to get only fully sequenced microbial genomes? It is a convenient way to filter by taxonomy and submission date, but When I use 'complete[prop]' I also get expression vectors, eg 'Expression vector mce1' for M.tuberculosis. Alternatively, is there a way to filter out engineered sequences?

I'm just asking for the efetch query, not what to do with the resulting id list.

entrez taxonomy • 3.3k views
ADD COMMENT
1
Entering edit mode

if I am not wrong, filter out Taxonomy ID: 28384; 81077; 12908

ADD REPLY
0
Entering edit mode

Yep, that works in this case at least. If you make it an answer I'll happily accept it.

ADD REPLY
3
Entering edit mode
12.8 years ago
Rm 8.3k

if I am not wrong, filter out Taxonomy ID: 28384; 81077; 12908

ADD COMMENT
0
Entering edit mode

For the record,according to the ncbi taxonomy browser: both 28384 and 81077 are tax_ids for 'artificial sequences' 12908 is for 'unclassified sequences' that contains metagenomic and environmental samples

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode
10.4 years ago
piet ★ 1.9k

Please be aware that current taxonomy is not in congruence with DNA sequence data for 'Mycobacterium tuberculosis'. The core genomes of M.tuberculosis and M.bovis are nearly 100% identical. Thus M.tuberculosis and M.bovis are clonal groups within a common species, which is called 'Mycobacterium tuberculosis complex' or MTC for short.

Taxonid may change over time. Therefore it is more robust to use taxon names in your query. Use double quotes if a name consists of more than one word.

"Mycobacterium tuberculosis complex"[Organism] AND complete[Properties]

Accession DQ823231.1 (Expression vector mce2, complete sequence) has two source records:

 source          1..24799
                 /organism="Expression vector mce2"
                 /mol_type="other DNA"
                 /db_xref="taxon:393135"
                 /focus
 source          4443..19156
                 /organism="Mycobacterium tuberculosis H37Rv"
                 /mol_type="other DNA"
                 /strain="H37Rv"
                 /db_xref="taxon:83332"

DQ823231.1 is included in the result set due to its second source record. The following query will exclude artifical sequences:

("Mycobacterium tuberculosis complex"[Organism] not "artificial sequences"[Organism]) AND complete[Properties]
ADD COMMENT

Login before adding your answer.

Traffic: 1670 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6