Can you assemble with merged paired end reads and unmatched reads as "single ended" reads?
1
0
Entering edit mode
6.1 years ago
O.rka ▴ 740

I used BBMap to isolate r1 and r2 reads while providng unmatched reads as well. It generated a single file. It is wise to use this in my assembler (SPAdes) as single-ended?

I ended up using the outm=./bbmap_output/mapped.fq from below as input into SPAdes as single ended with the -s flag. My mapped.fq file has not only paired r1 and r2 reads but also the singletons.

bbwrap.sh -Xmx40g in=./reads/r1.fq,$SINGLETONS in2=./reads/r2.fq,null outm=./bbmap_output/mapped.fq outu=./bbmap_output/unmapped.fq ref=./reference/assembly.fa out=./bbmap_output/output.sam lengthtag=t idtag=t covstats=./bbmap_output/output.covstats.txt rpkm=./bbmap_output/output.rpkm.txt threads=$N_JOBS usemodulo append

For SPAdes, my command was the following:

python spades-3.9.0/bin/spades.py -t $N_JOBS -s ./bbmap_output/mapped.fq -o ./spades_output/
Assembly • 3.9k views
ADD COMMENT
1
Entering edit mode

I used BBMap to isolate r1 and r2 reads. It generated a single file

Which tool from BBMap did you use? reformat.sh? If yes then you must have made a interleaved reads file.

You can look at SPAdes manual section 3.1 on how to specify interleaved reads.

--12 <file_name>
    File with interlaced forward and reverse paired-end reads.
ADD REPLY
0
Entering edit mode

@genomax thank you. I've added my command above in the original description. Are they still interlaced if it includes both paired r1/r2 and the unpaired reads?

ADD REPLY
2
Entering edit mode

You used bbmap.sh to align your data to assembly.fa. You obtained a SAM format file output.sam from that alignment, which is no longer in the fastq format but is in SAM format. You seem to have captured reads that did not align in unmapped.fq and that is probably in the interleaved format since you started with PE reads but provided only single output file name.

If you wanted to assemble the data then there is no need to align it first. You should start with your scanned/trimmed original data files and then go into SPAdes directly after that.

Note: If this question is a related to/follow-up on Extracted mapped shotgun metagenomic reads to reference genome. SPAdes or metaSPAdes for de-novo assembly? then you are on the right track. You can retrieve reads that mapped to your assembly by doing

reformat.sh in=output.sam out1=R1.fq out2.R2.fq

You can then use these reads in your SPAdes assembly.

ADD REPLY
0
Entering edit mode

Apologies, I forgot to add the bbwrap.sh command that I ended up using. I've updated the question with the correct command and my input into spades. Can reformat.sh create singletons? If I was able to regenerate the cleaned r1.fq, r2.fq, and singletons.fq fastq files, would assembling these with spades.py -1 r1.fq -2 r2.fq -s singletons.fq result in a better assembly than spades.py -s mapped.fq where mapped.fq has everything merged into one?

ADD REPLY
1
Entering edit mode

Unless you have a really high number of singletons, I suggest that you ignore them for now and try an assembly with properly paired reads.

ADD REPLY
0
Entering edit mode

GenoMax Hi, sorry to reply on such an old post but I have a similar question. The kneaddata output has unmatched_1.fq and unmatched_2.fq which are reads whose mates are lost but they themselves passed both trimmomatic and bowtie2 step. In this case would at what step would have reads without mates be an issue in downstream processing? Thanks in advance

ADD REPLY
0
Entering edit mode

You will need to provide some context. What exactly are you trying to do?

ADD REPLY
0
Entering edit mode

My apologies, I wish to use these sequences for taxonomic and functional profiling (plan to use metaphlan and humann) and also to assemble using megahit/metaspades. I wish to know if using those unmatched reads will affect these steps?

ADD REPLY
1
Entering edit mode

I am not sure which steps you are referring to. If programs you are planning to use can use the singleton reads then you would be able to use them. If they require paired-end data then these reads are not going to be useful.

ADD REPLY
1
Entering edit mode
6.1 years ago
O.rka ▴ 740

An adaptation from @genomax. I used BBMap for the paired separately and the de-interlaced the reads.

 bbmap.sh in=./reads/r1.fq in2=./reads/r2.fq ref=./reference/assembly.fa outm=$R1_R2 lengthtag=t idtag=t threads=$N_JOBS usemodulo
 reformat.sh in=$R1_R2 out1=$R1 out2=$R2
 bbmap.sh in=$SINGLETONS ref=./reference/assembly.fa outm=./bbmap_output/singletons.mapped.fq lengthtag=t idtag=t threads=$N_JOBS usemodulo
 python spades.py -t $N_JOBS -1 $R1 -2 $R2 -s $SINGLETONS -o ./spades_output/
ADD COMMENT

Login before adding your answer.

Traffic: 2302 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6