Library To Parse Absolute Reference Positions Of Indels From A Bam Cigar String
2
1
Entering edit mode
11.4 years ago
William ★ 5.3k

Is there a library or piece of code that I can use to parse the absolute reference positions of large indels ( +100 bp ) from a cigar string like one shown below? I looked in Picard but I couldn't find it, maybe I missed it? It would be nice to have the code in java.

I could try to write something myself but there is a ton of tools that use bam files and they all must be able to parse a bam file and the cigar string. But I can't find a library or piece of code that I can reuse to parse absolute locations of indels from the cigar string.

The cigar string shown below is from the actual alignment of a bac contig with bwa-mem against a reference. I would like to use bac alignments like one to extract a truth set of large indels. To see how good my NGS data based SV calling is.

bac_contig1   2064    1       13884324        60      1330M4D814M4D2891M1D82M136I973M6I2277M1D3M5D2M4D8M4I12M124D5M30D5M53D6M84D4M156D4M12D11M206D10M435D5M16D6M27D3M72D6M19D4M73D5M254D3M103D5M29D7M898D23M6D202M24D37M2D1341M16I2454M3D1293M11I8M1D9M30D2M3D24M7D5M18D3M8D10M4D4M1D19M1I5M7D10M9D4M2D4M7D13M5I5M13D7M2D5M10I5M31D14M27D12M15D6M11D12M4D7M3D3M1I1M3D9M2D6M1D8M2I10M7D3M10D6M4I21M2I2M3D4M5I6M9I7M1D6M6D11M5D3M8D6M9D7M20D4M1I16M68D5M61D9M25D7M6D5M2D2M2I3M1D7M7D4M14D5M14D4M1I6M18D3M8D9M31D11M3D5M53D3M12D7M22D3M5D8M17D6M1I4M1I3M5D9M129D4M1I5M73D888M6I1589M2I1156M2I4702M1D649M2I2957M12I1730M1D3647M6D2438M1I1614M28D220M8I4424M882I5215M1D1867M1I457M9I3621M2I59M4D996M3D852M1I372M136921H
bam sv picard api • 3.8k views
ADD COMMENT
2
Entering edit mode
11.4 years ago
William ★ 5.3k

Turns out to be quite easy with the picard API. Just had to add some extra code.

 File inputBam = new File("/home/william/BACS.bam"); 

    SAMFileReader bamreader = new SAMFileReader(inputBam);

    for(SAMRecord samRecord : bamreader)
    {
        //skip alignment with low mapping qual
        if(samRecord.getMappingQuality() < 60 ){continue;}

        //get the start of the alignment
        String currentChrom = samRecord.getReferenceName();
        Integer currentReferencePos = samRecord.getAlignmentStart();

        for (CigarElement cigarElement :samRecord.getCigar().getCigarElements())
        {
            if(cigarElement.getOperator() == CigarOperator.D)
            {
                Integer cigarElementStart = currentReferencePos;
                Integer cigarElementLenght = cigarElement.getLength();
                Integer cigarElementEnd = currentReferencePos+cigarElementLenght;

                if(cigarElementLenght > 100 )
                {
                    System.out.println("deletion found from " + currentChrom +":"  + cigarElementStart + "-" + cigarElementEnd );
                }  
            }

            //add the lenght of this cigarElement to the current pos to go to the start of the next element. Only if the cigar element consumes reference bases 
            if(cigarElement.getOperator().consumesReferenceBases())
            {
                 currentReferencePos = currentReferencePos + cigarElement.getLength();
            } 
        }            
    }
ADD COMMENT
2
Entering edit mode
11.4 years ago
Andreas ★ 2.5k

You might want to check out Pysam's AlignedRead class. It has a very useful property called aligned_pairs, which will generate a list of aligned read and reference positions for your read based on its CIGAR string.

Andreas

ADD COMMENT

Login before adding your answer.

Traffic: 2226 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6