Question

Instaling Reas Repeat Finding

3

Entering edit mode

14.6 years ago

Darked89 4.7k

I am trying to run ReAS, de novo repeat library construction package. My steps are described here:

http://openwetware.org/wiki/Wikiomics:Repeat_finding#ReAS

I got no errors running make (after changing one line, see above link), and several steps processing steps seems to be running OK. But when I run

nohup reas_all.pl -read 454_subset_4repeat_search.fas -pa 8 -output \ 
    454_subset_4repeat_search.fas.cons_reas   2 > nohup.reas1.txt &

I end up with:

/usr/local/bin/multi_run2.sh: line 72: dust: command not found

This is strange, since I do have dust (from wublast) on the PATH.

Line 72 multi_run2.sh is just eval $comm

The relevant part of the code is in run_SegLink.pl:

`split_fa_bygroup.pl ../$read_file $Nfiles 1`;
open (Fout, ">sh") || die "can not write file sh\n";

for (my $i=1; $i<=$Nfiles; $i++) {    
    print Fout "dust leaf.$i.fa 50 >$i.mask &\n";
}

`cat sh|multi_run2.sh -n $Nprocess -w 2\n`;
close Fout;
`cat *.mask >seg.mask.fa`;

my $N_out_file = $Nprocess * 10;
my $add = 0;

Two questions:

anybody managed to get ReAS working? If yes, how?
any ideas why I am getting "no dust" error?

EDIT: Perl code changed (previous one got mangled while doing copy-paste) See Michael comment below.

EDIT 2: title change (it is not about dust program anymore)

repeats next-gen-sequencing • 5.2k views

ADD COMMENT • link updated 12 months ago by Ram 44k • written 14.6 years ago by Darked89 4.7k

1

Entering edit mode

Sorry but this doesn't look like valid perl to me: for (my $i=1; $i$i.mask &n"; } Are your sure this is correct? Anyway, I have no idea about this tool, after seeing this code snippet, whatever it does, I would abstain from ever using it unless forced by threats to life ;D

ADD REPLY • link 14.6 years ago by Michael 55k

0

Entering edit mode

@Michel: my Firefox crashed while posting, I completed the post not noticing that code is mangled somehow. Sorry.

ADD REPLY • link 14.6 years ago by Darked89 4.7k

0

Entering edit mode

@Michel: now I know pre tag did not protect messing up HTML by Perl code.

ADD REPLY • link 14.6 years ago by Darked89 4.7k

Ram · Answer 1 · 2010-04-16

3

Entering edit mode

14.6 years ago

Michael 55k

Well, well, I see. The author of such a script is probably in urgent need of a perl course (rofl) What is done is: the perl script creates a shell script with some shell commands, adds "&" to fork them off and passes this to another shell script via stdin piping "|" that just evals the lines in (probably) bash syntax given to it. This is achieved by back-tick evaluation. The author nicely presents some of the worst practices in perl programming, unaware of such useful perl functions such as exec, system and fork. And somewhere on the way, the PATH environment is probably unfortunately lost. Anyway even though crude, this should work at a first glance. So check the following:

In the console, try which dust. If you don't get the path, the script cannot. Then try e.g. locate dust

As a quick fix, replace:

print Fout "dust leaf.$i.fa 50 >$i.mask &\n";

with

print Fout "/path/to/dust leaf.$i.fa 50 >$i.mask &\n";

better fix: make configuration variable for all executables that are called from inside it. Define them in the beginning of the script or in a separate config.pm and use something like:
```
print Fout "$dustexecutable leaf.$i.fa 50 >$i.mask &\n";
```
Even better: Try to re-write the code to make use of system and fork and to do all the things that are done purely in perl.

(sorry, no offence...)

ADD COMMENT • link updated 6.2 years ago by Ram 44k • written 14.6 years ago by Michael 55k

0

Entering edit mode

in the same shell window I was getting dust to execute. Same peeve with mixing shell scripting & perl/python. Applied 1 & 2, will know if this solves the problem in few hours.

ADD REPLY • link 14.6 years ago by Darked89 4.7k

0

Entering edit mode

Good news: it did not print "no dust" this time. Bad news: finished without printing any other error, but also without producing final output. 2.8k lines of Perl in 32 files + one shell file...

ADD REPLY • link 14.6 years ago by Darked89 4.7k

0

Entering edit mode

Why didn't this suprise me? And again, discovered just another case of a vapour-software publication (sometimes called 'misconduct'). Maybe send a nice mail to the authors, including a link to this question. You can also leave a comment on the PLoS-Comp. Biology site maybe.

ADD REPLY • link updated 5.3 years ago by Ram 44k • written 14.6 years ago by Michael 55k

0

Entering edit mode

Well, there has been ppl who managed to make ReAS work, but somehow this skill is rare... I posted few authors who did just that, maybe they will answer. re PLOS-Comp: I will post the links.

ADD REPLY • link 14.6 years ago by Darked89 4.7k

Ram · Answer 2 · 2010-07-09

late follow up:

I run ReAS in a single process mode:
```
nohup reas_all.pl -read 454_subset_4repeat_search.fas  -output 454_subset_4repeat_search.fas.cons_reas   2 > nohup.reas8.txt &
```
This does not change much, but helps to exclude any errors caused by executing commands in parallel
to my bewilderment our sys admins did not link all binaries from cross_match directory to bin, so I had no cross_match.manyreads on my PATH... Fixed.

run_SegLink.pl from ReAS writes lovely named temp.sh with a following command:

cross_match.manyreads leaf.1.fa leaf.1.fa -masklevel 101 -minmatch 14 -minscore 30 2>/dev/null |ConvertAlign.pl -s 100 -i 0.6 -t 0 | rmLCSlink.pl -m seg.mask.fa -j 50 | align_to_binary >>0.link &

This makes sure that if there are any problems with cross_match.manyreads then you can not read them.

Executing part of it by hand:

cross_match.manyreads leaf.1.fa leaf.1.fa -masklevel 101 -minmatch 14 -minscore 30  > cross_match.out

gives what was wrong:

1.008 Mbytes allocated -- total 3647.952 Mbytes

FATAL ERROR: MAX_NUM_BLOCKS exceeded

in summary: ReAS produces no sensible result because at this moment cross_match.manyreads can not take 1.8Gb fasta files as an input.
Phrap/cross_match distribution has a file db.c defining MAX_NUM_BLOCKS = 1000. Simple number increase to 4000 gives the same error, 5000 does not work:
```
1 1784847673 -725271951
FATAL ERROR: seq_area_size
```
seq_area_size is declared in swat.h as int. Changing everything which deals with seq_area_size from int to big int does not look good without consulting program author.

EDIT:

decreasing the size of the fasta input file (8x, from 2.1Gb to 265Mb/592k 454 reads). ReAS is executing "run" file:
```
cross_match.manyreads leaf.1.fa ../HD_reads.fa -masklevel 101 -minmatch 17 -minscore 30  2>/dev/null |ConvertAlign.pl -s 100 -i 0.6 -t 0 |HighDseg -s 100 -i 0.6 -d 6 >1.bk &
```
It is still running cross_match.manyreads part @100% CPU, close to 20h by now on Xeon @2.93GHz Running it in a single process mode => slow.