After many years of using Perl I am starting to learn Python. As an example I want to perform regular expression matching in sequences extracted from a FASTA file. The FASTA files being parsed with Biopython's SeqIO module. In the following code re.findall
fails to find iupac
in seq_record.seq
, however if the latter is replaced with a string, e.g. 'TTAATT', a match is found. Error = TypeError: expected string or buffer
.
# biopython
from Bio import SeqIO
# regex library
import re
# file with FASTA sequence
infile = "fasta.fa"
# pattern to search for
iupac = "taat"
# look through each FASTA sequence in the file
for seq_record in SeqIO.parse(infile, "fasta"):
print "Sequence ID: ", seq_record.id, "; ", len(seq_record), "bp"
print seq_record.seq
# scan for IUPAC; re.I makes search case-insensitive
matches = re.findall( iupac, seq_record.seq, re.I)
if matches:
print "Matches = ", len(matches)
Thanks for any guidance!
Hey!
How do I get to print the co-ordinates of the match?