Hi,
I was about to re-invent the wheel, again, when I thought that there are probably many among you who had already solved this problem.
I have a few huge fasta files (half a million sequences) and I would like to know the average length of the sequences for each of these files, one at a time.
So, let's call this a code golf. The goal is to make a very short code to accomplish the following:
Given a fasta file, return the average length of the sequences.
The correct answer will go to the answer with the most votes on friday around 16h in Quebec (Eastern Time).
You can use anything that can be run on a linux terminal (your favorite language, emboss, awk...), diversity will be appreciated :)
Cheers
CONCLUSION:
And the correct answer goes to the most voted question, as promised :) Thank you all for your great participation! I think I am not the only one to have appreciated the great diversity in the answers, as well as very interesting discussions and ideas.
EDIT:
Although the question is now closed, do not hesitate to vote for all the answers you found interesting, especially those at the bottom! They are not necessarily less interesting. They often have only arrived later :)
Thanks again!
A series of "code golf" tournaments might be a fun way to seed the proposed "Project Mendel"
I love the answers to this question! Amazing diversity: we get the perl, python, R scripts then .... whoa ... flex, clojure, erlang, haskell, ocaml, memory mapped C files.
Create a 'flat' fasta for the human genome that for each chromosome contains the entire sequence as a single line. Now run the tools below and see which one can still do it.
This is one of the best posts in the forum! Opened my eyes to the full universe of programming approaches. Can I +1 code golf as well as this post?
I would be incredibly impressed to see an answer in BF, Var'aq or Lolcode ... not for usability but just to see a "real-word" application.
Kind of what I had in mind :) Something like doing 1 golf-code per week and slowly building up a set of fun problems. Maybe there should be and off-biostars group discussing problem propositions and formulation. For example forming a google group... What do you think?