parse genome "start point to end point" to find what genes inside? blast in python
0
0
Entering edit mode
3.9 years ago
seok1213neo ▴ 40

I am trying to look for data retrieved from Islandviewer,

the list file shows the accession numbers of genomes and their corresponding start-end position of the gene-island.


for example:

accession start end

NC_000853.1 43768 64411

NC_000853.1 659002 664820

NC_000907.1 1498827 1513221

and so on


I want to look for what genes are inside those start-end position.

so i did the following coding, with the first sample from the above list

from Bio.Seq import Seq
from Bio.SeqFeature import SeqFeature, FeatureLocation
from Bio import Entrez
from Bio.Blast import NCBIWWW

with Entrez.efetch (db = "nucleotide", rettype = 'fasta', id = 'NC_000853.1', retmode = 'text') as handle:
    seq = SeqIO.read(handle, "fasta")

    parent = Seq(str(seq.seq))
    feature = SeqFeature(FeatureLocation(43768, 64411), type = "gene", strand = 1)
    g_island = feature.extract(parent)

    result = NCBIWWW.qblast("blastn", "nt", g_island)
    print (result)

I extracted the sequence from the start to end position and i tried to blast search the sequence data (from start point to end point) within biopython but it didnt work. how should I parse them?

genome gene genbank • 1.2k views
ADD COMMENT
0
Entering edit mode

Hello seok1213neo,

you should show us, how your data looks like exactly. Otherwise we can just guess.

fin swimmer

ADD REPLY
0
Entering edit mode

thank you for your reply! i have edited my post

ADD REPLY

Login before adding your answer.

Traffic: 1403 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6