Hi all,
I have a question regarding bwa-mem and bwa aln. I am trying to align 150 bp paired end data against a genome assembly using bwa mem. However, I am seeing a long peak of very short inserts for alignments which is not expected according to the lab reports. Just to verify I did the alignments using bwa bactrack and the peak has disappeared. Logs look very different too in terms of low and high boundries of insert length. Are there any insert size cutoffs set for both these algorithms or is there something in the way they align which causes this discrepancy?
BWA-bactrack:
[bwa_aln_core] 262144 sequences have been processed.
[bwa_aln_core] 262144 sequences have been processed.
[bwa_aln_core] calculate SA coordinate... [bwa_aln_core] calculate SA coordinate... 195.76 sec
[bwa_aln_core] write to the disk... 204.87 sec
[bwa_aln_core] write to the disk... [infer_isize] (25, 50, 75) percentile: (173, 231, 337)
[infer_isize] low and high boundaries: **150 and 665** for estimating avg and std
[infer_isize] inferred external isize from 26006 pairs: 271.186 +/- 116.759
[infer_isize] skewness: 1.187; kurtosis: 0.736; ap_prior: 3.73e-05
[infer_isize] inferred maximum insert size: 1029 (6.49 sigma)
[bwa_sai2sam_pe_core] time elapses: 5.22 sec
[bwa_sai2sam_pe_core] changing coordinates of 18261 alignments.
bwa-mem
[M::main_mem] read 69182 sequences (10000020 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (113, 4047, 36, 90)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (175, 331, 548)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1294)
[M::mem_pestat] mean and std.dev: (369.65, 270.10)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1667)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (144, 217, 329)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: **(1, 699)**
[M::mem_pestat] mean and std.dev: (242.81, 136.88)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 884)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (101, 208, 352)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 854)
[M::mem_pestat] mean and std.dev: (224.74, 141.14)
Any light you could shed on this will be very helpful Many thanks.
Not answering your question but would you mind checking the insert sizes (you can sub-sample a set if you don't want to use full files) using BBMap (if your reads don't overlap) or BBMerge (if R1/R2 overlap). Both programs are part of BBMap suite. You will want to use
ihist=
flag to create a file with insert size distribution.Sure. I will give that a go and let you know what I get.