Samtools Mpileup And Overlapping Paired-End Reads
2
2
Entering edit mode
11.0 years ago
alpha2zee ▴ 120

I am using samtools mpileup to generate a pileup for paired-end RNA sequencing data. I am curious about how samtools handles pair mates whose read mappings overlap. With regard to the simple example below, for positions 7-11, are both pair mates enumerated?

position     1    6    11   16
reference    ATGCATGCATGCATGC
pair mate 1  ATGCATGCATG
pair mate 2'       GCATGCATGC    (reverse complemented)

I am parsing the pileup output to quantify RNA editing at some positions and do not want to count an 'RNA molecule' twice just because the read pair mates 'overlapped.'

Thanks.

pileup paired-end rna-seq • 11k views
ADD COMMENT
1
Entering edit mode

If I have to guess then I think that the overlapping portion or bases will be counted twice.

ADD REPLY
0
Entering edit mode

You might drop your comment as an answer, as I think you are correct.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Sorry, is cross-posting a no-no? I am trying to reach more people and plan to post here a link to any conclusive answer I get elsewhere.

ADD REPLY
0
Entering edit mode
ADD REPLY
5
Entering edit mode
7.0 years ago
Owen S. ▴ 370

NOTE! This answer from Ashutosh Pandey is now misleading. At least for anyone using samtools >= 1.0 (which was released ~6 months after the answer was posted).

Since 1.0 (released Aug 15, 2014), we have the (somewhat confusingly named) --ignore-overlaps option:

-x, --ignore-overlaps  Disable read-pair overlap detection.

With this option, samtools mpileup 1.x behaves like it did in 0.19, but the default behavior is to NOT DOUBLE COUNT overlapping paired end reads.** It has been this way for the past 3.5 years.

You can try it yourself:

$ samtools
Program: samtools (Tools for alignments in the SAM format)
Version: 1.0 (using htslib 1.0)

$ cat test1.sam
@HD     VN:1.3  SO:coordinate
@SQ     SN:chr1 LN:248956422
seqid   163     chr1    39415   0       9M      =       39418   12      ATGTGTTTT       AAEE6EEEE
seqid   83      chr1    39418   0       9M      =       39415   -12     TGTTTTGCA       E<EAEEA<<

$ samtools mpileup test1.sam
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
chr1    39415   N       1       ^!A     A
chr1    39416   N       1       T       A
chr1    39417   N       1       G       E
chr1    39418   N       1       T       i
chr1    39419   N       1       G       Q
chr1    39420   N       1       T       i
chr1    39421   N       1       T       e
chr1    39422   N       1       T       i
chr1    39423   N       1       T$      i
chr1    39424   N       1       g       A
chr1    39425   N       1       c       <
chr1    39426   N       1       a$      <

$ samtools mpileup -x test2.bam
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
chr1    39415   N       1       ^!A     A
chr1    39416   N       1       T       A
chr1    39417   N       1       G       E
chr1    39418   N       2       T^!t    EE
chr1    39419   N       2       Gg      6<
chr1    39420   N       2       Tt      EE
chr1    39421   N       2       Tt      EA
chr1    39422   N       2       Tt      EE
chr1    39423   N       2       T$t     EE
chr1    39424   N       1       g       A
chr1    39425   N       1       c       <
chr1    39426   N       1       a$      <

Note that for this to work right, the paired read names must match (which they do for Illumina data).

ADD COMMENT
1
Entering edit mode
11.0 years ago

The overlapping bases in the paired end reads will be counted for both reads. In the above example, they will be counted twice.

ADD COMMENT
1
Entering edit mode

Thanks. I had a similar response to my cross-post, at http://seqanswers.com/forums/showpost.php?p=123101&postcount=4. The poster suggested clipping overlapping mates with, among other tools, bamUtil/clipOverlap.

ADD REPLY

Login before adding your answer.

Traffic: 2487 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6