Dear Abyss developers,
Background: I recently had success in using Abyss 2.0.2 to assemble my SE (25x), PE (25x) and MP (50x) reads into an assembly with scaffold N50 of 2Mb, which is relatively good. However, the unitig N50 is only 4 Kb, so lots of Ns are present in the sequences. It seems that abyss will just align PE and MP reads to the unitigs assembled by SE reads only, therefore a big proportion of PE and MP sequences are wasted (in the 'Different' category). For example, less than 40% of 10kb MP libraries can be aligned. With my very low SE coverage, I feel it is not very ideal to be starting material.
Approach: I am trying to concatenate all SE, PE and MP reads into "super-SE" reads to increase the unitig N50 and to improve subsequent PE and MP alignment efficiency. I have done very strict quality control of my MP reads to remove Nextera adaptors and transposes, so I don't think there are chimeras (defined as reads combining two fragments that are far apart). After constructing the super-SE reads by concatenating all fastq.gz files, I redid the assembly with the following command:
abyss-pe np=16 name=SWS k=66 pe='pe1' mp='mp1 mp2 mp3 mp4' \
se='SWS_super_SE.trimmomatic.fq.gz' \
pe1='SWS_PE_1.trimmomatic.fq.gz SWS_PE_2.trimmomatic.fq.gz' \
mp1='SWS_MP_1-4Kb_1.trimmomatic.fq.gz SWS_MP_1-4Kb_2.trimmomatic.fq.gz' \
mp2='SWS_MP_4-7Kb_1.trimmomatic.fq.gz SWS_MP_4-7Kb_2.trimmomatic.fq.gz' \
mp3='SWS_MP_7-10Kb_1.trimmomatic.fq.gz SWS_MP_7-10Kb_2.trimmomatic.fq.gz' \
mp4='SWS_MP_10-15Kb_1.trimmomatic.fq.gz SWS_MP_10-15Kb_2.trimmomatic.fq.gz'
Problem: Now it has taken several days to read the "super-SE" fastq.gz file. The following log is all I have got.
mpirun --mca btl_sm_use_knem 0 -np 16 ABYSS-P -k66 -q3
--coverage-hist=coverage.hist -s SWS-bubbles.fa -o SWS-1.fa SWS_super_SE.trimmomatic.fq.gzABySS 2.0.2
ABYSS-P -k66 -q3 --coverage-hist=coverage.hist -s SWS-bubbles.fa -o SWS-1.fa SWS_super_SE.trimmomatic.fq.gz
Running on 16 processors
1: Running on host iw-k32-34
...
...
0: Running on host iw-k32-34
0: Reading `SWS_super_SE.trimmomatic.fq.gz'...
Troubleshooting: Based on my past experience with Abyss, it seems strange for it to take several days to read 80G fastq.gz files. There are several possible reasons I could think of:
PE and MP /1 and /2 reads have same read names (just one has 1 and the other has 2), so Abyss runs into some hashing problems for the super-SE. I therefore concatenated only /1 reads from PE and MP. However, the same issue persists.
Some problem with openmpi, which I have little knowledge in.
Any ideas what could have gone wrong? Thank you very much in advance!
Thank you so much for your detailed instruction, @benv. That's really helpful! I'm gonna try your suggested commands.
Hi @benv,
I added v=-v to my command and found that the reading step became really slow when it reached ~30%. I read a post that had a similar problem http://seqanswers.com/forums/showthread.php?t=61602 and followed it to switch from openmpi to mpich3. The speed did improve significantly at the reading step.
Now I am running into some weird problem at the last few steps (see the error message below). This is for k=50. Do you know any way to get around this issue? Thank you again!
UPDATE: The run has been successfully finished when I switched to Open MPI 2.1.0. Thanks!
Good. Glad to hear that!
how to assemble single-ended fastq file ?? I tried the man-page but everywhere it is mentioned about the paired-end fastq assembly. can we use abyss for single-ended fastq file assembly also??
Sorry for slow response. To run a single-end assembly only, specify the input single-end reads with
se="<single-end FASTQ files>"
orin="<single-end FASTQ files>"
and add the word "unitigs" to the end of theabyss-pe
command.