I've hundreds of fasta files, each contain hundreds of sequences. Median sequence lengths are different between the files. I would like to remove sequences that are below median or 75%-tile length from each file. So far the scripts or tools I've came across such as USEARCH can only trim sequences based on user defined length. I'm looking for any useful ways to do the task including sed and awk. Any thought?
There are lots of relevant posts that may help you:
How To Filter Multi Fasta By Length??
A: Fasta Length
Thanks! But user defined length is not what I'm looking for because It is not sensible to process hundreds of file one by one.
You could script it. For example, if you have esl-seqstat in
$PATH
:So in context of your 100s of files:
Simple and elegant! This is exactly what I'm looking for, thank you.