Hi everyone.
So I have one genome (NC_007779.1) and a sequence (AGAAGTGCCAGACT) that belongs to it. Since I am trying to develop a tool for aligning binding sites, I would like to get the start and end position of the sequence in this genome, and also get the sequence with extra bases, both at the start and end. I think the best tool for doing this would be biopython, but since I am not very familiar with the package, I donĀ“t know how to approach this problem using it. Any suggestion?
Thank you in advance.
Have you had a look at any of the biopython tutorials?
Yes, mainly the official one http://biopython.org/DIST/docs/tutorial/Tutorial.html
Can you clarify what you mean by "also get the sequence with extra bases"?
BioPython will work, but its unlikely to be very fast, especially if your genome is large. String matching specifically can take a good while.