Comparing alignment stats BWAmem vs. Soap2 aligner
0
0
Entering edit mode
6.4 years ago
pinn ▴ 210

Hi,

I aligned a human genome dataset with two different aligners 1) BWAmem 2) Soap2. while comparing the alignment output for a aligned bam files I find a lot of difference.

1) no. of mapped reads 2) properly paired

,any suggestions ..

**BWAmem** 
samtools flagstat -@ 10  SRR2962693_BWAmem.sam 
118628072 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
63900 + 0 supplementary
0 + 0 duplicates
**118444503 + 0 mapped (99.85% : N/A)**
118564172 + 0 paired in sequencing
59282086 + 0 read1
59282086 + 0 read2
**117806380 + 0 properly paired (99.36% : N/A)**
118326136 + 0 with itself and mate mapped
54467 + 0 singletons (0.05% : N/A)
177530 + 0 with mate mapped to a different chr
130003 + 0 with mate mapped to a different chr (mapQ>=5)

Soap2

samtools flagstat -@ 10 SRR2962693_3.sam
1948750 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
**1948750 + 0 mapped (100.00% : N/A)**
1948750 + 0 paired in sequencing
974375 + 0 read1
974375 + 0 read2
**1948750 + 0 properly paired (100.00% : N/A)**
1948750 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Here are the commands used:

Soap2

soap -D soap2.21release/index/hg38.fa.index -a benchmarkdatasets/sickle_SRR2052405_R1q20.fastq -b benchmarkdatasets/sickle_SRR2052405_R2q20.fastq -o SRR2052405.soap -2 unpaired-hits -u unmapped.fq -l 256 -p 5

BWA

bwa mem -t 10 BWA-MEM/hg38 sample1/fixed1_normal.fq sample1/fixed2_normal.fq > fixed.sam
genome next-gen alignment • 3.5k views
ADD COMMENT
2
Entering edit mode

Just asking out of interest. Why are you comparing these two aligners? I am asking because sometimes one wastes a lot of time doing these kind of comparisons, and in the end, you use BWA anyway, because it is the most widely used, well-documented, maintained and accepted software out there, especially when it comes to mutation calling and structural variant detection. You might consider "simply" to use. I say that because I myself wasted a lot of time doing things like this.

ADD REPLY
1
Entering edit mode

Something about this result is not right. Why are there 10x more reads in bwa stats than soap2 if this is the same exact input file?

ADD REPLY
0
Entering edit mode

OP needs to add their exact alignment commands

ADD REPLY
0
Entering edit mode

Soap2 CMD

soap -D soap2.21release/index/hg38.fa.index -a benchmarkdatasets/sickle_SRR2052405_R1q20.fastq -b benchmarkdatasets/sickle_SRR2052405_R2q20.fastq -o SRR2052405.soap -2 unpaired-hits -u unmapped.fq -l 256 -p 5

BWA CMD

bwa mem -t 10 BWA-MEM/hg38 sample1/fixed1_normal.fq sample1/fixed2_normal.fq > fixed.sam
ADD REPLY
0
Entering edit mode

Please edit your question and add this in there. Also, use the code formatting option to present your post better.

code_formatting

ADD REPLY
0
Entering edit mode

yes, the exact same file. I performed alignment on 4 different human genome datasets, more reads in bwa ?. I tried with other aligners like cushaw3, bowtie2, Kart and Whisper the stats doesn't differ much among these aligners when compare with soap2 stats I found lot of difference. How to sort it out ?

ADD REPLY
0
Entering edit mode

I agree with ATpoint that this probably is not worth your time to puzzle out. You have sane answers from multiple other aligners, there is something wrong with soap (maybe your genome index is corrupted?) so I'd just move on and use bowtie2 or bwa, on the grounds that they are more popular; it's easier to get help with them, and it's easier for others to replicate your work with them (Is that true of bowtie2 anymore?). Rather than finding the aligner that gives you a fraction of a percentage better alignment numbers, better to pick one, and study it and its options to know how to use it best.

ADD REPLY
0
Entering edit mode

Thanks for your suggestion.

ADD REPLY

Login before adding your answer.

Traffic: 2623 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6