ALLPATHS-LG slow ... multi threads possible ?
2
0
Entering edit mode
8.2 years ago
Picasa ▴ 650

Is anybody has some experience with ALLPATHS-LG ?

Is that possible to run in a multi threads mod ?

Especially, the first step PrepareAllPathsInput takes me a lot of time ... It's running since 2 days, is it normal ?

The command is:

PrepareAllPathsInput\
DATA_DIR=$PWD/rep\
PLOIDY=1\
IN_GROUPS_CSV=/rep/in_groups.csv\
IN_LIBS_CSV=rep/in_libs.csv\
OVERWRITE=True\
| tee prepare.outre
Assembly allpaths • 2.7k views
ADD COMMENT
0
Entering edit mode
8.2 years ago
thackl ★ 3.0k

It is a bit tricky, but you can speed up data import by setting up a cache first with CacheLibs.pl/CacheGroups.pl. See "ALLPATHS Cache for power users", page 19 in the manual for details.

With those scripts you can add different libraries individually and in parallel, and CacheGroups.pl also allows forking of library import to parallel jobs.

ADD COMMENT
0
Entering edit mode

Thanks for your support.

1) What do you mean by "ou can add different libraries individually and in parallel". Do I have to create a in_groups.csv and a in_libs.csv for each library and run CacheLibs.pl/CacheGroups.pl separately ?

2) "CacheGroups.pl also allows forking of library import to parallel jobs." : Is it with the "HOSTS" option ? I'm not sure how to use it (I work on a cluster)

ADD REPLY
0
Entering edit mode

1) Yes, that's what I did. I split up my original file in serveral groups/libs.csv and imported those.

2) As far as I understand, if you just provide a single number, it will set the number of forked processes (libraries processed in parallel) on the current node, i.e. it more or less should be equal to setting threads.

ADD REPLY
0
Entering edit mode
8.2 years ago
caizexi123 ▴ 60

Hi,

I think you need to check if something wrong, normally the PrepareAllPathsInput will take less than two days to prepare 300G data.

ADD COMMENT
0
Entering edit mode

So it's look like I am blocked to the MergePairedFastbs step. What is your command ?

ADD REPLY
0
Entering edit mode

The same to you. I think you need to check whether your server I/O or network have some problem.

Mon Apr 11 17:13:24 2016 (MPF): Merging. Mon Apr 11 19:41:19 2016 (MPF): Merged 816052282 read pairs. Mon Apr 11 19:41:19 2016 (MPF): Done.

This is the normal speed. If your file is writing few M per hour, then there are some problems.

ADD REPLY
0
Entering edit mode

I see... It's very fast on your side.

Do you think it's a problem with the I/O process ??

ADD REPLY
0
Entering edit mode

I don't know, because I didn't fix, at the end I change server. I think it is something wrong with writing big file, because before MergePairedFastbs it need to convert to fastb, and the speed of writing is normal.

ADD REPLY
0
Entering edit mode

Just for information, can you tell me how much time it requires for you to run FastqToFastbQualb

ADD REPLY
0
Entering edit mode

Around 40~45 minutes for 816052282 read pairs.

ADD REPLY
0
Entering edit mode

Ok that's weird.

We have different servers (from 512G to 1T ram) and some cpus (>64) but it tooks me 10 hours to process 150M of read pairs. (just the FastqToFastbQualb script)

ADD REPLY
0
Entering edit mode

I did experience some weird problems with ALLPATHS-LG in the past on some machines that could be similar to what you are experiencing - I never really figured out the real reason or how to fix it, though. In my case, once a certain amount of RAM was loaded with data (ALLPATHS reads entire libraries in RAM), the process kind of stalled, i.e. no more I/O activity and no CPU activity, just sitting there with high memory demands. Check top or similar, to see what your jobs are doing.

ADD REPLY
1
Entering edit mode

OK. Let me write down what happen to my case. I am using cluster. I used ALLPATH-LG before, the whole thing is normal. Then we have more data (small data set with long insert mate pair data), and I want to rerun. At that time, the whole cluster was maintained. After the maintain, I rerun and the odd thing happen, MergePairedFastbs just output few Mb per hour. I tried different nodes, they were all the same. At the end, I have to use a local server. And, by the way, I tried three different version of ALLPATH-LG. I contacted the administrator, but he said he did not do anything to the cluster and the process is running but just very slow.

ADD REPLY
0
Entering edit mode

I found the soluton. Now it runs perfectly !!

Just one question: how much disk space it takes for the complete assembly with your data ?

ADD REPLY
1
Entering edit mode

What was your solution??

Up to 5TB for 3Gbp plant genome.

ADD REPLY
0
Entering edit mode

My storage disk was too slow. I had to run it from the local disk.

ADD REPLY
0
Entering edit mode

Even with top. What I see is that the jobs are running.. It writes the file BUT it is very very slow.

DO you use the last version ? ie: 52488 ?

ADD REPLY

Login before adding your answer.

Traffic: 1905 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6