Entering edit mode
5.7 years ago
PaSua
•
0
Python newby here.
I was wondering if there is a way of getting the sequence of a genome from NCBI giving a point of start and end. For instance, I'm working with this genome ID (NC_011375.1) and I would like to obtain the sequence that is between 259882 and 259896 bases. So far, I have this:
from Bio import Entrez
from Bio import SeqIO
Entrez.email = "my@email.org"
handle = Entrez.efetch(db="nuccore",
id="NC_011375.1",
rettype="gb",
retmode="text")
whole_sequence = SeqIO.read(handle, "genbank")
print whole_sequence[259882:259896]
And this is the output I get:
ID: NC_011375.1
Name: NC_011375
Description: Streptococcus pyogenes NZ131, complete genome.
Number of features: 0
UnknownSeq(14, alphabet = IUPACAmbiguousDNA(), character = 'N')
As you can see, it´s not working. Since I don´t know how to proceed, any help would be appreciated.
Thank you in advance.
I don't know the syntax for this command, but keep in mind that Python uses 0-based indexing, so the first base is actually in position 0 not 1- you must adjust accordingly.
Solved. I wasn´t using the correct ID (it needs to be a CP reference, not a NC_). Anyway, thank you because I needed to adjust the position accordingly to Python indexing, as you said.
I put the solution here hopping someone will find it useful:
output: