Going From Cigar String In Sam To Genomic Coordinates?
2
4
Entering edit mode
14.4 years ago
User 9996 ▴ 840

How can I go from a CIGAR string, given in the SAM output format, to a set of start/end genomic coordinates for paired-end sequences? The SAM format gives the start coordinate but I need to find the end coordinate as well. Thanks.

sam parsing cigar alignment • 11k views
ADD COMMENT
8
Entering edit mode
14.4 years ago

Have a look at the Cigar.java source code in Picard. For example:

/**
 * @return The number of reference bases that the read covers, including padding.
 */
public int getPaddedReferenceLength() {
    int length = 0;
    for (final CigarElement element : cigarElements) {
        switch (element.getOperator()) {
            case M:
            case D:
            case N:
            case EQ:
            case X:
            case P:
                length += element.getLength();
        }
    }
    return length;
}
ADD COMMENT
2
Entering edit mode

Once you have length, the position of the last base is: end = start + length - 1. Don't forget the minus one, since we're "counting fenceposts" (google it).

ADD REPLY
4
Entering edit mode
14.4 years ago

The bamToBed program in BEDTools will do this by create BED entries for each BAM alignment where the end end coordinate reflects the CIGAR string. Moreover, if you want to create separate BED entries for "spliced" or "split" alignments (i.e., when there is an "N: CIGAR op present), use the -split option.

ADD COMMENT
2
Entering edit mode

Just remember, BED entries are 0-indexed, half-open ranges (up to but not including the end position). A 5 bp read starting at (1-based) position 10 would be "chr1 9 15" in BED.

ADD REPLY
0
Entering edit mode

Hi

I am trying the split option on my bam file and it is only reporting 6 fields in the output and that too for all the alignments. Can you suggest what is going wrong??

ADD REPLY

Login before adding your answer.

Traffic: 1782 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6