Hi all,
I'm trying to incorporate a regular expression command in a biopython script. This prodcues an error:
AttributeError: 'str' object has no attribute 'id'
What I would like to do is to match a pattern within a Fasta file and replace the matching characters with other characters.
From this:
>BA_03462|gyrB Brenneria alni strain NCPPB
ATGTCGAATTCTTATGACTCCTCAAGTATCAAGGTATTGAAAGGGCTGGATGCGGTACGT
To this:
>BA|gyrB Brenneria alni strain NCPPB
ATGTCGAATTCTTATGACTCCTCAAGTATCAAGGTATTGAAAGGGCTGGATGCGGTACGT
Using the re module I can find and replace the pattern with this command:
matches = re.findall(r'_(.....)', str(seq_record))
for m in matches:
change = str(seq_record), faa_filename.replace('_%s' % m, ' ')
The complete function is here:
def change_string():
with open('outfile_padded.fasta')as f:
for seq_record in SeqIO.parse(f, "fasta"):
seq_record.id = seq_record.description = matches = re.findall(r'_(.....)', str(seq_record))
for m in matches:
change = str(seq_record), faa_filename.replace('_%s' % m, ' ')
SeqIO.write(change, 'string.fasta', "fasta")
change_string()
However the attribute error arises as biopython wants a string like object, but re wants a string. I've tried to modify the script but cannot find a way to please both modules.
Does anyone know a solution to this?
Thanks,
James
python --version Python 3.6.8 :: Anaconda, Inc. biopython==1.73 Red Hat 4.8.5-36
Do you absolutely need to use python? Would it not be easier to just use
sed
? Also, why not usere.sub(..., count=0)
?Building on RamRS's comment, why even use Biopython/SeqIO? Can't you just treat your data as a standard text file and blow through it line-by-line, avoiding any overhead from
SeqIO.parse()
(only really matters if your fasta is large)? I would also usesed
for a quick turnaround.While it is probably fine to do so in this case, I would contend that the better general advice is to always use a well trusted parser whenever possible...
Yes it would probably be easier to use a sed or awk command. I was trying to keep this part of my pipeline to python to avoid having to go out of a single python script and also I want to learn more python.
Would the re.sub command aviod using findall and replace?
Find matches to a regular expression + substitute =
re.sub
is the first thing that comes to my mind, as the substitute operation is not complex enough to warrant a find/match followed by a bunch of steps. From a cursory glance atre
documentation (I don't use python), it seems like the substitution argument can also be a method, which would address even complicated substitution problems. I see no reason to not usere.sub
.