Entering edit mode
8.0 years ago
peteladrien
•
0
I am currently getting informations on sequences using NCBI entrez API. The url looks like : https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=JVEU01000013,HQ844023.1&rettype=gb&retmode=xml
and output looks like:
<?xml version="1.0" encoding="UTF-8"?>
<GBSet>
<GBSeq>
<GBSeq_locus>JVEU01000013</GBSeq_locus>
<GBSeq_length>5266</GBSeq_length>
<GBSeq_strandedness>double</GBSeq_strandedness>
<GBSeq_moltype>DNA</GBSeq_moltype>
<GBSeq_topology>linear</GBSeq_topology>
<GBSeq_division>BCT</GBSeq_division>
<GBSeq_update-date>10-JUL-2015</GBSeq_update-date>
<GBSeq_create-date>10-JUL-2015</GBSeq_create-date>
<GBSeq_definition>Stenotrophomonas maltophilia strain 498_SMAL 1015_5266_269573_11+,1127+,970+, whole genome shotgun sequence</GBSeq_definition>
<GBSeq_primary-accession>JVEU01000013</GBSeq_primary-accession>
<GBSeq_accession-version>JVEU01000013.1</GBSeq_accession-version>
<GBSeq_other-seqids>
<GBSeqid>gb|JVEU01000013.1|</GBSeqid>
<GBSeqid>gnl|WGS:JVEU01|1015_5266_269573_11+,11></GBSeqid>
<GBSeqid>gi|876108632</GBSeqid>
</GBSeq_other-seqids>
<GBSeq_project>PRJNA267549</GBSeq_project>
<GBSeq_keywords>
<GBKeyword>WGS</GBKeyword>
</GBSeq_keywords>
<GBSeq_source>Stenotrophomonas maltophilia</GBSeq_source>
<GBSeq_organism>Stenotrophomonas maltophilia</GBSeq_organism>
<GBSeq_taxonomy>Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales; Xanthomonadaceae; Stenotrophomonas; Stenotrophomonas maltophilia group</GBSeq_taxonomy>
<GBSeq_references>
<GBReference>
<GBReference_authors>
<GBAuthor>Roach,D.J.</GBAuthor>
</GBReference_authors>
<GBReference_title>A Year of Infection in the Intensive Care Unit: Prospective Whole Genome Sequencing of Bacterial Clinical Isolates Reveals Cryptic Transmissions and Novel Microbiota</GBReference_title>
<GBReference_journal>PLoS Genet. 11 (7), E1005413 (2015)</GBReference_journal>
</GBSeq_references>
<GBSeq_comment>Source DNA available from Steve Salipante, University of Washington Department of Laboratory Medicine, Box 357110, 1959 NE Pacific Street, NW120 Seattle, WA 98195-7110; ##Genome-Assembly-Data-START## Assembly Method ABYSS v. 1.3.5 Genome Coverage 21x Sequencing Technology Illumina HiSeq ##Genome-Assembly-Data-END##</GBSeq_comment>
<GBSeq_feature-table>
<GBFeature>
<GBFeature_key>source</GBFeature_key>
<GBFeature_location>1..5266</GBFeature_location>
<GBFeature_intervals>
<GBInterval>
<GBInterval_from>1</GBInterval_from>
<GBInterval_to>5266</GBInterval_to>
<GBInterval_accession>JVEU01000013.1</GBInterval_accession>
</GBInterval>
</GBFeature_intervals>
<GBFeature_quals>
<GBQualifier>
[...]
</GBQualifier>
</GBFeature_quals>
</GBFeature>
</GBSeq_feature-table>
<GBSeq_sequence>g[...]catcccgaactcggaa</GBSeq_sequence>
<GBSeq_xrefs>
<GBXref>
</GBSeq_xrefs>
</GBSeq>
<GBSeq>
<GBSeq_locus>HQ844023</GBSeq_locus>
<GBSeq_length>942</GBSeq_length>
<GBSeq_strandedness>single</GBSeq_strandedness>
<GBSeq_moltype>RNA</GBSeq_moltype>
<GBSeq_topology>linear</GBSeq_topology>
<GBSeq_division>VRL</GBSeq_division>
<GBSeq_update-date>01-AUG-2011</GBSeq_update-date>
<GBSeq_create-date>01-AUG-2011</GBSeq_create-date>
<GBSeq_definition>Rotavirus A HC91xUK reassortant (UKg9KC-1) NSP3 protein gene, complete cds</GBSeq_definition>
<GBSeq_primary-accession>HQ844023</GBSeq_primary-accession>
<GBSeq_accession-version>HQ844023.1</GBSeq_accession-version>
<GBSeq_other-seqids>
<GBSeqid>gb|HQ844023.1|</GBSeqid>
<GBSeqid>gi|341832806</GBSeqid>
</GBSeq_other-seqids>
<GBSeq_source>Rotavirus A HC91xUK reassortant (UKg9KC-1)</GBSeq_source>
<GBSeq_organism>Rotavirus A HC91xUK reassortant (UKg9KC-1)</GBSeq_organism>
<GBSeq_taxonomy>Viruses; dsRNA viruses; Reoviridae; Sedoreovirinae; Rotavirus; Rotavirus A</GBSeq_taxonomy>
<GBSeq_references>
<GBReference>
<GBReference_reference>1</GBReference_reference>
<GBReference_position>1..942</GBReference_position>
<GBReference_authors>
<GBAuthor>Rippinger,C.M.</GBAuthor>
</GBReference_authors>
<GBReference_title>Genome sequences of the NIH UK-bovine reassortant vaccine components</GBReference_title>
<GBReference_journal>Unpublished</GBReference_journal>
</GBReference>
<GBReference>
</GBReference>
</GBSeq_references>
<GBSeq_feature-table>
<GBFeature>
<GBFeature_key>source</GBFeature_key>
<GBFeature_location>1..942</GBFeature_location>
<GBFeature_intervals>
<GBInterval>
</GBInterval>
</GBFeature_intervals>
<GBFeature_quals>
<GBQualifier>
[...]
</GBQualifier>
</GBFeature_quals>
</GBFeature>
<GBFeature>
<GBFeature_key>CDS</GBFeature_key>
[...]
</GBFeature_quals>
</GBFeature>
</GBSeq_feature-table>
<GBSeq_sequence>atgct[...]tgaatag</GBSeq_sequence>
</GBSeq>
</GBSet>
I would like to retrieve only GBSeq_accession-version, GBSeq_moltype, GBSeq_topology, GBSeq_organism and GBSeq_taxonomy, so the outpul would look like this:
<?xml version="1.0" encoding="UTF-8"?>
<GBSet>
<GBSeq>
<GBSeq_moltype>DNA</GBSeq_moltype>
<GBSeq_topology>linear</GBSeq_topology>
<GBSeq_accession-version>JVEU01000013.1</GBSeq_accession-version>
<GBSeq_organism>Stenotrophomonas maltophilia</GBSeq_organism>
<GBSeq_taxonomy>Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales; Xanthomonadaceae; Stenotrophomonas; Stenotrophomonas maltophilia group</GBSeq_taxonomy>
</GBSeq>
<GBSeq>
[...]
</GBSeq>
</GBSet>
Is there any way to specify the field we want to retrieve in the entrez query?