How can I know that a read is mapping to forward strand or reverse strand ?
How can I know that a read is mapping to forward strand or reverse strand ?
A sam record contains a binary flag (aka SAM-FLAG). The bit for 16 is set if the read is mapped on the reverse strand
see the SAM specification http://samtools.sourceforge.net/SAM1.pdf ( section "FLAG: bitwise FLAG")
see also "explain SAM flag " http://picard.sourceforge.net/explain-flags.html
"explain SAM flag" has moved on https://broadinstitute.github.io/picard/explain-flags.html
One way to extract stranded reads from a BAM file:
samtools view Reads.bam | gawk '(and(16, $2))' > forwardStrandReads.sam
samtools view Reads.bam | gawk '(! and(16, $2))' > reverseStrandReads.sam
Won't this and Alex's solution will only work if you already have filtered away the unmapped reads using the -F 4
option? Otherwise for with the -F 16
option in SAMtools view or 'not containing 16' filtering by gawk you just get everything that isn't mapping to the reverse strand which could be more than those that map to the forward strand.
This post seems to work for single paired reads to sort forward reads without that initial step because it excludes anything that is mapping to reverse strand or unmapped.
Pierre and Alex's answers are great.
But flag :
16 or 0x10 means SEQ being reverse complemented
Which means it should be:
samtools view Reads.bam | gawk '(and(16, $2))' > reverseStrandReads.sam
samtools view Reads.bam | gawk '(! and(16,$2))' > forwardStrandReads.sam
I found it's strange when I grep primer in forward and reverse sam ( most forward primer in reverse sam, and vice versa ), it comes out to be misunderstanding of the flag.
wc -l *sam.out
2551 ATACTGGTATGTATTTAACCATGCAGATCC.forwardStrandReads.sam.out
1001 TTTCAGTTGATTTGCTTGAGATCAAGATTG.reverseStrandReads.sam.out
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hello everyone,
This post is really interesting, I actually need to analyse (ht-seq counts) .bam files (sorted and indexed) that have been mapped by someone else before me. My files are as this form: - sample.merged.mdup.forward.bam - sample.merged.mdup.reverse.bam.
When looking for ht-seq counts tutorials, there was only one .bam (and the .gff) as input... ->Does someone, working on paired-end mode, have any idea of how can we manipulate the forward.bam and the reverse.bam for counting step
Many thanks in advance