How to convert extended CIGAR to regular CIGAR
2
1
Entering edit mode
7.8 years ago
artemd ▴ 20

So I have a sam file with extended CIGAR format like this:

17H87=1D12=1D2=3D32=1D4=2D5=2D13=....

And I want to convert it to a sam file containing regular CIGAR format that will look like this:

17S83M1D14M1D4M3D31M1D2M2D6M...

(they do not represent the same reads... I want only to visualize the problem)

Anyone knows some software/script that will convert sam/bam with extended CIGAR to a sam/bam with regular CIGAR? It seems like a standard problem that might already have a solution so I want to ask here before solving it the hard way(coding)

sam bam cigar CIGAR • 3.9k views
ADD COMMENT
0
Entering edit mode

Assuming you have a .sam named old.sam

 samtools view -Sh old.sam |  awk '/^@/ {print $0; next;} {$6=gensub(/[=X]/, "M", "g", $6); print $0;}' > new.sam

Disclaimer: I haven't tested this on a real samfile, as I don't have one available right now

Argh. No. Wrong. Sorry. Disregard the first version.

Edits: Misread what you wanted at first, now it should replace = and X in extended with M.

ADD REPLY
0
Entering edit mode

Thanks for the reply, looks very elegant and probably will work for times when rapid testing is needed.

ADD REPLY
7
Entering edit mode
7.8 years ago
GenoMax 147k

reformat.sh from BBMap suite can do this: reformat.sh in=orig.sam out=new.sam sam=1.3

ADD COMMENT
0
Entering edit mode

Forgot to say thanks, so thanks. Will be using this (I used BBmap for trimming/filtering, strangely in their manual for reformat there is no mention of CIGAR at all) manual

ADD REPLY
1
Entering edit mode

Developers probably get to the manual updates last. I think @Brian pretty much single-handedly develops BBMap suite so we should cut him a bit of slack :)

ADD REPLY
0
Entering edit mode

Sure, I'm not complaining or berating. Just was stating that although I used the tool I didn't know I could use it for this purpose as well. Great tool overall. Thanks for your help.

ADD REPLY
4
Entering edit mode
7.8 years ago

I've quickly written one: https://github.com/lindenb/jvarkit/wiki/Biostar234081

 $ cat toy.sam 
@SQ SN:ref  LN:45
@SQ SN:ref2 LN:40
r001    163 ref 7   30  1M2X5=4I4M1D3M  =   37  39  TTAGATAAAGAGGATACTG*XX:B:S,12561,2,20,112

 $ java -jar dist/biostar234081.jar toy.sam 
@HD VN:1.5  SO:unsorted
@SQ SN:ref  LN:45
@SQ SN:ref2 LN:40
r001    163 ref 7   30  8M4I4M1D3M  =   37  39  TTAGATAAAGAGGATACTG*XX:B:S,12561,2,20,112
ADD COMMENT
0
Entering edit mode

Thanks, I checked the code and it seems like it would handle this problem in a robust fashion. I expected something exactly like this to exist somewhere, and now that you have built it (very fast) it exists here.

Thanks again.

ADD REPLY

Login before adding your answer.

Traffic: 2156 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6