Hi ABySS,
Now, I used Hi-seq X10 to generate M001_R1.fastq (170GB, 450,000,000 reads) and rM001_R1.fastq (170GB, 450,000,000 reads), with 37xcoverage,95% reads > Q30, and most have length 150bp.
My compute resource is only a single machine with 64 cores and total memory 520GB. My simple question here is whether I can run the 2 fastq files in ABySS1.9.0.
I did some tests of the data using 4 and 8 cores using my computer and I got the following results:
1. Used 4 cores
$ cat abysspe91.sh.o3767790
/opt/openmpi/bin/mpirun -np 4 ABYSS-P -k64 -q3 -v --coverage-hist=coverage.hist -s FR07886691_Human_WEEJAR_R1R2-bubbles.fa -o FR07886691_Human_WEEJAR_R1R2-1.fa
M001_R1.fastq
M001_R2.fastq
ABySS 1.9.0
ABYSS-P -k64 -q3 -v --coverage-hist=coverage.hist -s FR07886691_Human_WEEJAR_R1R2-bubbles.fa -o FR07886691_Human_WEEJAR_R1R2-1.fa
M001_R1.fastq
M001_R2.fastq
Running on 4 processors
0: Running on host omega-0-9.local
1: Running on host omega-0-9.local
2: Running on host omega-0-9.local
3: Running on host omega-0-9.local
0: Reading `HCCJFCCXX_2_150527_FR07886691_Human__R_150526_WEEJAR_FGS_M001_R1.fastq'...
1: Reading `HCCJFCCXX_2_150527_FR07886691_Human__R_150526_WEEJAR_FGS_M001_R2.fastq'...
[cut]
0: Read 6900000 reads. 0: Hash load: 229724096 / 536870912 = 0.428 using 8.01 GB
1: Read 7000000 reads. 1: Hash load: 229433787 / 536870912 = 0.427 using 8 GB
[Job stopped here without error message. I think out of memory]
2. Used 8 cores
$ cat abysspe81_eager.sh.o3769063
/opt/openmpi/bin/mpirun -np 8 ABYSS-P -k64 -q3 -v --coverage-hist=coverage.hist -s FR07886681_Human_WEEJAR_R1R2_T8-bubbles.fa -o FR07886681_Human_WEEJAR_R1R2_T8-1.fa M001_R1.fastq
M001_R2.fastq
ABySS 1.9.0
ABYSS-P -k64 -q3 -v --coverage-hist=coverage.hist -s FR07886681_Human_WEEJAR_R1R2_T8-bubbles.fa -o FR07886681_Human_WEEJAR_R1R2_T8-1.fa
M001_R1.fastq
M001_R2.fastq
Running on 8 processors
0: Running on host omega-0-17.local
1: Running on host omega-0-17.local
2: Running on host omega-0-17.local
3: Running on host omega-0-17.local
4: Running on host omega-0-17.local
5: Running on host omega-0-17.local
6: Running on host omega-0-17.local
7: Running on host omega-0-17.local
0: Reading `HCCJFCCXX_1_150527_FR07886681_Human__R_150526_WEEJAR_FGS_M001_R1.fastq'...
1: Reading `HCCJFCCXX_1_150527_FR07886681_Human__R_150526_WEEJAR_FGS_M001_R2.fastq'...
[cut]
0: Read 16800000 reads. 0: Hash load: 232538291 / 536870912 = 0.433 using 8.11 GB
1: Read 16500000 reads. 1: Hash load: 231000632 / 536870912 = 0.43 using 8.06 GB
0: Read 16900000 reads. 0: Hash load: 233529399 / 536870912 = 0.435 using 8.15 GB
[Job stopped here without error message. I think out of memory]
I concluded that if I used 64 cores, ABySS could load only 132,000,000 reads of each fastq file, so my job would be failed using ABySS.
Again, can you guys help me to get a result of genome assembly? I really like to use ABySS due to high accuracy. I try to split my large file into 60 small files, but ABySS still uses the same memory to load the small files. Thank you in advance
Cheers,
Weerachai
It is not clear if "your computer" which you ran the tests on is the same computer as the one with 64 cores and 520Gb memory. How much memory was available for the test run?
Also, did you perform quality checking and trimming, adapter trimming, error correction, maybe digital normalization? These steps should lower memory requirements.
Thanks for this. Actually, I am thinking to do these now. I did cut adapters, but not for others, as I can see that Aybss can do quality check.
It is interesting! In your case, how well can error correction and digital normalisation cut down the number of reads?
Cheers, weerachai
I wonder what the output of "free -g" is when ABySS hangs - that would tell you whether it actually runs out of memory or whether it has started to write to swap (the latter would explain why it seemingly hangs, it just takes forever once it starts writing to disk), or whether there's actually still enough memory left
Hi Philipp,
I would like to know too, but to my knowledge, it is difficult to check this. I am using a cluster that too many people are using and 64 cores would be reasonable for me to wait in a queue. I have submitted my job to SGE using qsub. Actually all compute nodes have 64 cores with 520 GB. They are probably set to provide all assigned cpu resources to finish a job. The details I have known are as follows;
Cheers,
weerachai
SGE lets you redirect stdout and stderr to a file, did you check the file for errors?
I think I deleted STDERR files, but I checked both. They were just nothing (0 byte).
Weerachai
I never used SGE, but Torque will send these messages to the output, in fact, I got one today:
Are you redirecting Abyss output to some file? It is difficult to troubleshoot if there is no output and no clues.
Thx h.mon for your great support,
1. I reran the test and got these:
As you see, nothing was in
*.pe
and*.e
files and the*.o
file stopped at loading reads again2. I reran another test in login node and not submit to SGE. I got the following:
It seems clear to me that it would be something about memory usage if I consider the following total memory of the login node:
Cheers,
Weerachai
It could be that the system administrators set limits to the memory used by users of the login node, considering that it's just for submitting jobs. So the node may have enough memory but you're not allowed to use it. Back at UQ I got angry automated emails when I ran tasks on login nodes...
ulimit -a
may tell you more about your allowed limits, but I wouldn't use the login node for anything.Can you ssh into your computing node while the job is running?
Now I have no clues of IP address or ssh'able hostname for compute nodes. The admin would not allow me to get them I think. Thx anyway Philipp
You can try to start an interactive job, but I do not know how to do it on SGE. You will attract the admins ire if you keep running abyss at the login node, depending on local policies and how much you abuse you could be blocked from using the cluster.
P.S.: have you been using the login node all this time?
Login nodes for testing are okay and the abyss tests were run only for few hours before stopped. Weerachai
Have you solved your problem?
I face the same problem. I am gonna try to split the fastq files into smaller ones. But I don't how if it is feasible to join the resultant assemblies.