Is there a ready-made command-line program that can output sequence lengths from a fasta file? Like so:
YKR054C,4092
YLR106C,4910
...
I've seen examples of people trying to roll their own using awk
, but I would rather not do that if I can avoid it. Between Emboss, Biopython, Bioperl, etc there must be something that can do this, right? What I'm looking for would be the equivalent of what blast_formatter
can do for blast hits, something that can extract simple information that's already there in a format specified by user. Thank you.
"but I would rather not do that if I can avoid it" why ?
Because I looked at the syntax for awk and it seemed rather messy (it didn't help that the name reminds me of "awkward"). I thought I was going to save myself time, but now it appears that awk is probably the most efficient way.
see also: Code Golf: Mean Length Of Fasta Sequences