Question

Is there a more optimal SAM orientation parsing method?

1

Entering edit mode

9.6 years ago

Daniel ★ 4.0k

I have been working on a script for identifying frequent 'end points' of paired end reads from a SAM file on a genome for determining uneven cut frequencies, which I need to integrate the orientation and 1st/2nd read information into. The 2nd column of a SAM file gives the detail on the orientation with a number (as described here and here).

I hadn't used a bioperl import method because all I was doing was pulling the genomic start position of the read and building a hash by frequency, but I need to now tune it for directionality and mate pair.

My question is whether there is a better way to do this than:

## Numbers are random examples, not correctly done at the moment.
if ($line[1] == 99){
    $coordinate = $start
}elsif($line[1] == 147){
    $coordinate = $start - $readlength
}elsif($line[1] == 83){
    $coordinate = $start + $readlength
}.........etc

Given that a picture paints a thousand words, basically what I want is to pull out the coordinates of the red circles from my SAM file. It's part of a bigger perl script, so I'd prefer to avoid other tools, but if they exist I should look at them.

perl SAM • 2.4k views

ADD COMMENT • link updated 22 months ago by Ram 44k • written 9.6 years ago by Daniel ★ 4.0k

0

Entering edit mode

what is -/+36 the read length ? it 's not as simple for the negative-strand read because you'll have to walk to the cigar string (indel and insertion)...

ADD REPLY • link 9.6 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Sorry, those numbers would vary based upon the read length. I jut know that in my data here, the reads are all trimmed to 36bp.

From my bowtie mapping, the forward and reverse reads are independent, not pair-matched, so I think they can be independently looked at without needing to worry about indels, right? I considered mapping them together but as I'm interested in the end points, it's not essential that everything has a pair and I wanted to avoid throwing any more out.

ADD REPLY • link 9.6 years ago by Daniel ★ 4.0k

0

Entering edit mode

"From my bowtie mapping, the forward and reverse reads are independent, not pair-matched, so I think they can be independently looked at without needing to worry about indels, right? " : no if there is an indel in the read (cigar string contains 'I' or 'D' ) or even clipping you can't just add the values.

ADD REPLY • link 9.6 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Ah, ok I see. I was only thinking of indels between the paired reads. I can include that in the calculation, but I'm starting to think that the bioperl SAM module will be better suited for this.

ADD REPLY • link 9.6 years ago by Daniel ★ 4.0k

Ram · Accepted Answer · 2015-05-13

1

Entering edit mode

9.6 years ago

Damian Kao 16k

You can parse the bitwise flags with the AND operator in perl. For example, the 5th bit in the bitwise flag denotes orientation:

if ($flag & 16) { 
   return 'negative strand';
} else {
   return 'positive strand';
}

The 5th bit is 2^4 = 16. 1st bit is 2^0, 2nd bit is 2^1...

ADD COMMENT • link updated 22 months ago by Ram 44k • written 9.6 years ago by Damian Kao 16k

0

Entering edit mode

I've just had a crash-course in bitwise flags, I see how this works now, thanks.

ADD REPLY • link 9.5 years ago by Daniel ★ 4.0k