Entering edit mode
5.7 years ago
zhangdengwei
▴
210
Dear all,
I am confused about how the insert length (9th column) was computed in SAM file, for instance, the two PE50 reads below were aligned to reference genome, and the 4th column is 1-based leftmost mapping POSition for each read. For CL200085426L1C001R001_4504, the insert size should be equal to the following equation,
87845148 - 87845071 + 50 = 127
the result is the same as the 9th column, while it is incorrect for CL200085426L1C001R001_4486 with insert length of 11 using the same computing method. So anyone knows how to fetch the insert size? Thanks very much!
CL200085426L1C001R001_4486 83 chr16 34103407 55 50M = 34103466 11
CL200085426L1C001R001_4486 163 chr16 34103466 59 50M = 34103407 -11
CL200085426L1C001R001_4504 99 chr5 87845071 60 50M = 87845148 127
CL200085426L1C001R001_4504 147 chr5 87845148 60 50M = 87845071 -127
The 4486 pair are an "outie" mapping — the read at the lower coordinate is the reversed one. So you would have to tell us more about your sequencing protocol for us to venture an opinion on why this has been marked as a proper pair and had an insert size calculated at all.
Indeed: Which tool produced these alignments?
My sequencing protocol is circle-Seq, which can be found in this paper.
I have processed CIRCLE-seq data. Are you not using the proprietary software? - https://github.com/tsailabSJ/circleseq
You do not have to specify the insert size.
Yes, thanks. I have used that software to deal with my data, due to a slight difference between circle-seq data and mine, so I had altered the analysis method in order to fit my own data.
here it's just MPOS -POS. Which tool produced this sam ?
I used BWA mem to align paired-end reads to reference genome.
which version of bwa mem, it seems that the code isn't just MPOS-POS https://github.com/lh3/bwa/blob/f090f93460a0379d78a97bfa6133bae5bc8283a6/bwamem.c#L862
my version of BWA is 0.7.17-r1188