I have many samples, each with reads in two files: ${interleaved}.fastq and ${orphaned}.fastq. The assembly, used here as reference, was constructed with megahit
using both sets.
Correct me if I'm wrong, but to get accurate coverage statistics I need to, for each sample, map reads from the interleaved and the orphaned file onto the assembly. Can bowtie2
or some other tool do that? I was thinking a one-line command for each sample, like below (no for loop here, my shell script skills are modest)
bowtie2 -p 20 -x final.contigs.build.fa --interleaved ${interleaved}.fastq -U ${orphaned}.fastq -S sample_x.sam 2> sample_x.out
Would this work? I haven't tested this on very many samples, but for a couple of them the size of sample_x.sam output file appears the same, with or without the -U argument in the command, if the --interleaved argument is left in place.
I was once told to map interleaved and orphaned files in two separate steps and then combine output .sam files using samtools. If this is the better option, which samtools command(s) would accomplish that?
Either way, the goal is the same: a .sam
and later a sorted .bam
file for each sample.
You can use
bbwrap.sh
from BBMap suite to map paired-end and singletons together like this.bbwrap.sh in1=read1.fq,singletons.fq in2=read2.fq,null out=mapped.sam append
reformat.sh
can de-interleave your reads easily.reformat.sh in=interleaved.fq.gz out1=Read1.fg.gz out2=Read2.fq.gz
.Thanks, @genomax!! Your answer works and may be the most straightforward option.
I am still holding out for a
bowtie2
solution, perhaps mediated bysamtools
. The only reason is my amateur understanding ofbbmap
performance (large false positive rate).Thanks for link. I will let @Brian (author of BBMap) know about it.
Couple of things. That blog link is from 2015 and BBMap is continuously evolving. It is very flexible (as they author of that blog admitted). If your data is not metagenomic then you don't need to worry.
Does
orphaned
refer to reads that were not used in the assembly or just those reads whose partner missing (from a pair)?orphaned
are no-partner reads.megahit
can take interleaved and orphaned input to produce one assembly (at least it didn't give me an error), like thisOk. So they were part of the assembly. Just wanted to make sure.