I modified a script using BioPython to remove sequences with only gaps from multifasta files. But I struggle on how to loop this over multiple files, as I don't fully understand how BioPython works. How would I set the input and output files in this case?
Thanks, Jon
from Bio import SeqIO
INPUT = "test.fas"
OUTPUT = "test_output.fas"
def main():
records = SeqIO.parse(INPUT, 'fasta')
filtered = (rec for rec in records if any(ch != '-' for ch in rec.seq))
SeqIO.write(filtered, OUTPUT, 'fasta')
if __name__=="__main__":
main()
Thanks both for the comments! If I may ask, am I onto something here?
I get this error message:
Run this and you will understand where is the problem
Just a heads-up: filter is a function in Python, bad practice to replace it with yours. Use another name.