Can't get accession number using Entrez.
0
0
Entering edit mode
3.5 years ago

I have the following Python code in which I am trying to get the sequence of a TMV replicase gene:

# Lookup ID
search = Entrez.esearch(db='gene', term='Tobacco mosaic virus[Orgn] replicase', idtype="acc")
read = Entrez.read(search)
idlist = read["IdList"]

# Get sequence
search = Entrez.efetch(db='gene', id=idlist[0], retmode='text', rettype='gb')
read = SeqIO.read(search, "genbank")
sequence = read.seq

However, using idtype="acc" doesn't give me an accession number. Instead, I get 1494081, the ID for the gene in the gene database.

When I then try and fetch it in the gene database it throws ValueError: No records found in handle.

I've also tried fetching with the nucleotide database but without the accession number the ID leads to the wrong thing.

python search entrez • 1.1k views
ADD COMMENT
0
Entering edit mode

Using EntrezDirect:

$ esearch -db gene -query "Tobacco mosaic virus [Orgn] AND replicase"| esummary | xtract -pattern GenomicInfoType -element ChrAccVer,ChrStart,ChrStop
NC_001367.1 68  3418
NC_001367.1 68  4918

To get the sequence

$ esearch -db gene -query "Tobacco mosaic virus [Orgn] AND replicase"| esummary | xtract -pattern GenomicInfoType -element ChrAccVer,ChrStart,ChrStop | xargs -n 3 sh -c 'efetch -db nuccore -id "$0" -seq_start "$1" -seq_stop "$2" -format fasta'
ADD REPLY
0
Entering edit mode

Is there an equivalent to xtract for biopython? I can't seem to find one.

ADD REPLY

Login before adding your answer.

Traffic: 2226 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6