Question

parse genome "start point to end point" to find what genes inside? blast in python

0

Entering edit mode

4.3 years ago

seok1213neo ▴ 40

I am trying to look for data retrieved from Islandviewer,

the list file shows the accession numbers of genomes and their corresponding start-end position of the gene-island.

for example:

accession start end

NC_000853.1 43768 64411

NC_000853.1 659002 664820

NC_000907.1 1498827 1513221

and so on

I want to look for what genes are inside those start-end position.

so i did the following coding, with the first sample from the above list

from Bio.Seq import Seq
from Bio.SeqFeature import SeqFeature, FeatureLocation
from Bio import Entrez
from Bio.Blast import NCBIWWW

with Entrez.efetch (db = "nucleotide", rettype = 'fasta', id = 'NC_000853.1', retmode = 'text') as handle:
    seq = SeqIO.read(handle, "fasta")

    parent = Seq(str(seq.seq))
    feature = SeqFeature(FeatureLocation(43768, 64411), type = "gene", strand = 1)
    g_island = feature.extract(parent)

    result = NCBIWWW.qblast("blastn", "nt", g_island)
    print (result)

I extracted the sequence from the start to end position and i tried to blast search the sequence data (from start point to end point) within biopython but it didnt work. how should I parse them?

genome gene genbank • 1.3k views

ADD COMMENT • link 4.3 years ago by seok1213neo ▴ 40

0

Entering edit mode

Hello seok1213neo,

you should show us, how your data looks like exactly. Otherwise we can just guess.