Entering edit mode
6.1 years ago
O.rka
▴
740
I used BBMap
to isolate r1
and r2
reads while providng unmatched reads as well. It generated a single file. It is wise to use this in my assembler (SPAdes
) as single-ended?
I ended up using the outm=./bbmap_output/mapped.fq
from below as input into SPAdes
as single ended
with the -s
flag. My mapped.fq
file has not only paired r1
and r2
reads but also the singletons
.
bbwrap.sh -Xmx40g in=./reads/r1.fq,$SINGLETONS in2=./reads/r2.fq,null outm=./bbmap_output/mapped.fq outu=./bbmap_output/unmapped.fq ref=./reference/assembly.fa out=./bbmap_output/output.sam lengthtag=t idtag=t covstats=./bbmap_output/output.covstats.txt rpkm=./bbmap_output/output.rpkm.txt threads=$N_JOBS usemodulo append
For SPAdes, my command was the following:
python spades-3.9.0/bin/spades.py -t $N_JOBS -s ./bbmap_output/mapped.fq -o ./spades_output/
Which tool from BBMap did you use?
reformat.sh
? If yes then you must have made a interleaved reads file.You can look at SPAdes manual section 3.1 on how to specify interleaved reads.
@genomax thank you. I've added my command above in the original description. Are they still interlaced if it includes both paired r1/r2 and the unpaired reads?
You used
bbmap.sh
to align your data toassembly.fa
. You obtained a SAM format fileoutput.sam
from that alignment, which is no longer in the fastq format but is in SAM format. You seem to have captured reads that did not align inunmapped.fq
and that is probably in the interleaved format since you started with PE reads but provided only single output file name.If you wanted to assemble the data then there is no need to align it first. You should start with your scanned/trimmed original data files and then go into
SPAdes
directly after that.Note: If this question is a related to/follow-up on Extracted mapped shotgun metagenomic reads to reference genome. SPAdes or metaSPAdes for de-novo assembly? then you are on the right track. You can retrieve reads that mapped to your assembly by doing
You can then use these reads in your SPAdes assembly.
Apologies, I forgot to add the
bbwrap.sh
command that I ended up using. I've updated the question with the correct command and my input into spades. Canreformat.sh
create singletons? If I was able to regenerate the cleanedr1.fq
,r2.fq
, andsingletons.fq
fastq files, would assembling these withspades.py -1 r1.fq -2 r2.fq -s singletons.fq
result in a better assembly thanspades.py -s mapped.fq
wheremapped.fq
has everything merged into one?Unless you have a really high number of singletons, I suggest that you ignore them for now and try an assembly with properly paired reads.
GenoMax Hi, sorry to reply on such an old post but I have a similar question. The kneaddata output has unmatched_1.fq and unmatched_2.fq which are reads whose mates are lost but they themselves passed both trimmomatic and bowtie2 step. In this case would at what step would have reads without mates be an issue in downstream processing? Thanks in advance
You will need to provide some context. What exactly are you trying to do?
My apologies, I wish to use these sequences for taxonomic and functional profiling (plan to use metaphlan and humann) and also to assemble using megahit/metaspades. I wish to know if using those unmatched reads will affect these steps?
I am not sure which steps you are referring to. If programs you are planning to use can use the singleton reads then you would be able to use them. If they require paired-end data then these reads are not going to be useful.