Can You Improve This Erlang Code ? (Was "Average Length Of The Sequences In A Fasta File")
2
1
Entering edit mode
14.4 years ago

In a previous question "Code golf: mean length of fasta sequences", Eric asked for some solutions to get the average length of the sequences in a fasta file.

I tried to anwser this question using the following Erlang code:

-module(golf).
-export([test/0]).

line([],{Sequences,Total}) ->  {Sequences,Total};
line(">" ++ Rest,{Sequences,Total}) -> {Sequences+1,Total};
line(L,{Sequences,Total}) -> {Sequences,Total+string:len(string:strip(L))}.

scanLines(S,Sequences,Total)->
        case io:get_line(S,'') of
            eof -> {Sequences,Total};
            {error,_} ->{Sequences,Total};
            Line -> {S2,T2}=line(Line,{Sequences,Total}), scanLines(S,S2,T2)
        end  .

test()->
    {Sequences,Total}=scanLines(standard_io,0,0),
    io:format("~p\n",[Total/(1.0*Sequences)]),
    halt().

Compilation/Execution:

erlc golf.erl
erl -noshell -s golf test < sequence.fasta
563.16

this code seems to work fine for a small fasta file but it takes hours to parse uniprot_sprot.fasta (in fact , I pressed Ctr-C). Why ? I'm an Erlang newbie, can you improve this code ?

code fasta sequence functional • 3.2k views
ADD COMMENT
0
Entering edit mode

Pierre, in the mean time you should may be post your question in stackoverflow as well. Just a suggestion.

ADD REPLY
0
Entering edit mode

Fred, I'll do if I don't get an answer here :-)

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
14.4 years ago

The answer is here

ADD COMMENT
0
Entering edit mode
14.4 years ago

A few years ago I read a blog post series on Erlang's text processing performance. Back then it seemed that the language did not have an efficient string representation.

ADD COMMENT

Login before adding your answer.

Traffic: 2572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6