How to convert CIGAR string to VCF?
1
I am aligning two sequences with an aligner that outputs a CIGAR string. Is there a way to convert the CIGAR string to a VCF file with variants describing the differences between the two sequences (and with either one as a reference). Thank you!
CIGAR
VCF
alignment
variant-calling
• 350 views
pseudo code:
refpos1 = < something-clipped-start>
readpos0= 0;
for( CigarElement ce: cigar) {
char co = ce.getOperator( ) ;
int length = ce.getLength( ) ;
switch( co) {
case 'P' : break ;
case 'S' :
readpos0+= length;
break ;
case 'H' :
break ;
case 'I' :
print( contig+"\t" refpos1+"\tINSERTION_LEN=" +length) ;
readpos0+= length;
break ;
case 'D' : case 'N' :
print( contig+"\t" refpos1+"\tDELETION_LEN=" +length) ;
refpos1+= length;
break ;
case 'M' : case '=' : case 'X' :
for( i= 0; i< length; i++) {
char c1 = genome[ contig] [ refpos1-1] ;
char c2 = readseq[ readpos0] ;
if( c1!= c2) {
print( contig+"\t" refpos1+"\t" +c1+"\t" +c2) ;
}
refpos1++;
readpos0++;
}
break ;
}
}
Login before adding your answer.
Traffic: 2804 users visited in the last hour
i think this is a good exercise (I wrote a quick code that I have used for cigar parsing in javascript https://cmdcolin.github.io/posts/2022-02-06-sv-sam#appendix-b---cigar-parsing) but you might also consider converting your data into standardized records like PAF or SAM, and then standardized variant called tools can work on it. PAF notably has the "paftools call" command from minimap2, though it uses the cs tag instead of cigar https://github.com/lh3/minimap2/blob/master/cookbook.md#calling-variants-from-assembly-to-reference-alignment