Tophat 2 - Both Pairs Map Concordantly
0
2
Entering edit mode
12.3 years ago
AW ▴ 350

Hi,

I have just run the following command

tophat2 --solexa1.3-quals -p 12 -r 80 --max-multihits 1 --no-mixed --no-discordant /home/Turkey/Index/turkeyindex /home/Turkey/WTCHG24920061sequence1.fa /home/Turkey/WTCHG24920062sequence_2.fa

I understand this is instructing Tophat 2 to map the reads so that no multiple hits are allowed, and both pairs of reads have to map concordantly.

However, when I examine the stats of the output in Bamtools I get the following

Total reads: 33102389
Mapped reads: 33102389 (100%)
Forward strand: 16600010 (50.1475%)
Reverse strand: 16502379 (49.8525%)
Failed QC: 0 (0%)
Duplicates: 0 (0%)
Paired-end reads: 33102389 (100%)
'Proper-pairs': 28865628 (87.201%)
Both pairs mapped: 30696760 (92.7328%)
Read 1: 16559354
Read 2: 16543035
Singletons: 2405629 (7.26724%)

Why do only 92.7% of the reads fall under both pairs are mapped and 7.2% are singletons? I would expect there to be no singleton reads and 100% where both pairs have mapped, given the options I specified?

Any help would be much appreciated.

Thanks

tophat paired-end read • 4.6k views
ADD COMMENT
0
Entering edit mode

I don't know the answer, but are you sure that you want to eliminate unpaired reads? When you're using tophat, even single ends will be split and mapped and will help define exon-exon boundaries (or may span fusions)

ADD REPLY
0
Entering edit mode

that's a question for the developers, but in any case, you can filter the singletons:

samtools view file.bam | perl -lane 'print if ($F[6] eq "=")' > filtered.sam
ADD REPLY
0
Entering edit mode

thank you very much for all the helpful comments. When I try this command, and try to view the output in samtools it says it is not a value sam file as it cannot detect headers? Why might this be?

Thanks

ADD REPLY
0
Entering edit mode

That's because the header is omitted, to keep the header you need to do this:

  samtools view -h file.bam | perl -lane 'print if ($F[6] eq "=" or m/^\@/)' > filtered.sam

The -h flag shows the header and the "m/^\@/" will keep those lines from the filtering.

This output is a simple text SAM, to convert againt to BAM:

  samtools view -bS filtered.sam > filtered.bam
ADD REPLY
0
Entering edit mode

The stats indicate 100% of reads mapped but some mapped as singletons. Perhaps this has to do with the definitions of 'concordant', 'discordant', and 'proper pair'. Perhaps some alignments are being considered concordant (same chromosome) but are not properly paired according to orientation and insert size and are being counted as singletons for that reason? Maybe view and post the alignments for some of these reads...

ADD REPLY

Login before adding your answer.

Traffic: 1886 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6