Entering edit mode
6.4 years ago
pinn
▴
210
Hi,
I aligned a human genome dataset with two different aligners 1) BWAmem 2) Soap2. while comparing the alignment output for a aligned bam files I find a lot of difference.
1) no. of mapped reads 2) properly paired
,any suggestions ..
**BWAmem**
samtools flagstat -@ 10 SRR2962693_BWAmem.sam
118628072 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
63900 + 0 supplementary
0 + 0 duplicates
**118444503 + 0 mapped (99.85% : N/A)**
118564172 + 0 paired in sequencing
59282086 + 0 read1
59282086 + 0 read2
**117806380 + 0 properly paired (99.36% : N/A)**
118326136 + 0 with itself and mate mapped
54467 + 0 singletons (0.05% : N/A)
177530 + 0 with mate mapped to a different chr
130003 + 0 with mate mapped to a different chr (mapQ>=5)
Soap2
samtools flagstat -@ 10 SRR2962693_3.sam
1948750 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
**1948750 + 0 mapped (100.00% : N/A)**
1948750 + 0 paired in sequencing
974375 + 0 read1
974375 + 0 read2
**1948750 + 0 properly paired (100.00% : N/A)**
1948750 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Here are the commands used:
Soap2
soap -D soap2.21release/index/hg38.fa.index -a benchmarkdatasets/sickle_SRR2052405_R1q20.fastq -b benchmarkdatasets/sickle_SRR2052405_R2q20.fastq -o SRR2052405.soap -2 unpaired-hits -u unmapped.fq -l 256 -p 5
BWA
bwa mem -t 10 BWA-MEM/hg38 sample1/fixed1_normal.fq sample1/fixed2_normal.fq > fixed.sam
Just asking out of interest. Why are you comparing these two aligners? I am asking because sometimes one wastes a lot of time doing these kind of comparisons, and in the end, you use BWA anyway, because it is the most widely used, well-documented, maintained and accepted software out there, especially when it comes to mutation calling and structural variant detection. You might consider "simply" to use. I say that because I myself wasted a lot of time doing things like this.
Something about this result is not right. Why are there 10x more reads in
bwa
stats thansoap2
if this is the same exact input file?OP needs to add their exact alignment commands
Soap2 CMD
BWA CMD
Please edit your question and add this in there. Also, use the code formatting option to present your post better.
yes, the exact same file. I performed alignment on 4 different human genome datasets, more reads in bwa ?. I tried with other aligners like cushaw3, bowtie2, Kart and Whisper the stats doesn't differ much among these aligners when compare with soap2 stats I found lot of difference. How to sort it out ?
I agree with ATpoint that this probably is not worth your time to puzzle out. You have sane answers from multiple other aligners, there is something wrong with soap (maybe your genome index is corrupted?) so I'd just move on and use bowtie2 or bwa, on the grounds that they are more popular; it's easier to get help with them, and it's easier for others to replicate your work with them (Is that true of bowtie2 anymore?). Rather than finding the aligner that gives you a fraction of a percentage better alignment numbers, better to pick one, and study it and its options to know how to use it best.
Thanks for your suggestion.