I am trying to create a function that can take in a file and check to see if it's a valid fasta file or not (such as making sure there's no leading tabs or spaces, the first character starts with >
, no empty lines between sequences, etc.).
I have tried using SeqIO.parse(filename, "fasta")
, but it returned true for cases where it only had the description line with >
and no sequence provided.
I was trying to code this, but I was wondering if there was other packages that checks validity of FASTA format?
Thanks -
Must check this using seqkit
https://bioinf.shenwei.me/seqkit/usage/#seq
If you need empty records to be considered invalid, maybe you could issue a pull request to biopython
You could subclass the SeqIO operations and extend the sequence checking processes for empty seqs etc?