How to convert from a cigar string to extended cigar string?
2
I have a cigar string and the MD tag of the corresponding read record and I want to get the extended cigar string. Is there any java/c/c++ code or library that allows me to do that?
cigar
cpp
bam
extended-cigar
sam
• 4.2k views
reformat.sh
from BBMap suite .
reformat.sh in=your.bam out=new.bam sam=1.4
I wrote samfixcigar : http://lindenb.github.io/jvarkit/SamFixCigar.html
$ cat toy.sam
@SQ SN:ref LN:45
@SQ SN:ref2 LN:40
r001 163 ref 7 30 8M4I4M1D3M = 37 39 TTAGATAAAGAGGATACTG * XX:B:S,12561,2,20,112
r002 0 ref 9 30 1S2I6M1P1I1P1I4M2I * 0 0 AAAAGATAAGGGATAAA *
r003 0 ref 9 30 5H6M * 0 0 AGCTAA *
r004 0 ref 16 30 6M14N1I5M * 0 0 ATAGCTCTCAGC *
r003 16 ref 29 30 6H5M * 0 0 TAGGC *
r001 83 ref 37 30 9M = 7 -39 CAGCGCCAT *
x1 0 ref2 1 30 20M * 0 0 aggttttataaaacaaataa ????????????????????
x2 0 ref2 2 30 21M * 0 0 ggttttataaaacaaataatt ?????????????????????
x3 0 ref2 6 30 9M4I13M * 0 0 ttataaaacAAATaattaagtctaca ??????????????????????????
x4 0 ref2 10 30 25M * 0 0 CaaaTaattaagtctacagagcaac ?????????????????????????
x5 0 ref2 12 30 24M * 0 0 aaTaattaagtctacagagcaact ????????????????????????
x6 0 ref2 14 30 23M * 0 0 Taattaagtctacagagcaacta ???????????????????????
$ java -jar dist/samfixcigar.jar \
-r samtools-0.1.19/examples/toy.fa \
samtools-0.1.19/examples/toy.sam
output:
@HD VN:1.4 SO:unsorted
@SQ SN:ref LN:45
@SQ SN:ref2 LN:40
r001 163 ref 7 30 8=4I4=1D3= = 37 39 TTAGATAAAGAGGATACTG * XX:B:S,12561,2,20,112
r002 0 ref 9 30 1S2I6=1P1I1P1I1X1=2X2I * 0 0 AAAAGATAAGGGATAAA *
r003 0 ref 9 30 2=1X3= * 0 0 AGCTAA *
r004 0 ref 16 30 6=14N1I5= * 0 0 ATAGCTCTCAGC *
r003 16 ref 29 30 5= * 0 0 TAGGC *
r001 83 ref 37 30 9= = 7 -39 CAGCGCCAT *
x1 0 ref2 1 30 16=1X3= * 0 0 AGGTTTTATAAAACAAATAA ????????????????????
x2 0 ref2 2 30 15=1X3=1X1= * 0 0 GGTTTTATAAAACAAATAATT ?????????????????????
x3 0 ref2 6 30 9=4I13= * 0 0 TTATAAAACAAATAATTAAGTCTACA ??????????????????????????
x4 0 ref2 10 30 1X3=1X20= * 0 0 CAAATAATTAAGTCTACAGAGCAAC ?????????????????????????
x5 0 ref2 12 30 2=1X21= * 0 0 AATAATTAAGTCTACAGAGCAACT ????????????????????????
x6 0 ref2 14 30 1X22= * 0 0 TAATTAAGTCTACAGAGCAACTA ???????????????????????
Login before adding your answer.
Traffic: 2287 users visited in the last hour
what is an "extended cigar string" ? give an example of input / output.
AFAIK nucleotide match/mismatch (X,=) instead of a alignment match (M).
2 examples: - cigar string "100M" MD tag "43C5C43T6" output = "43=C5=C43=T6=".