I have a FASTA file with numerous protein sequences (the header in one line and amino acid codes in several lines). Some of these sequences contain ambiguous or exceptional amino acid codes (e.g., B, J, O, U, Z, X, -- ). I want to remove sequences containing such code and generate a new FASTA file. How I can do this in python? I did manage the remove one amino acid code (X) at a time with the following code. But how can I remove them all at once?
from Bio import SeqIO
sequences = SeqIO.parse("sequences.fasta", "fasta")
filtered = [seq for seq in sequences if seq.seq.count('X') == 0]
with open('sequences_without_Xs', 'wt') as output:
SeqIO.write(filtered, output, 'fasta')
Thank you!! That works for every possible flaw.