Question

Weird results in TopHat2 paired-end alignment

0

Entering edit mode

4.1 years ago

Aspire ▴ 370

I have aligned paired-end reads with TopHat2. In the resulting BAM file, there are reads that do map as "read map in proper pair" (their flags "include" the flag 2) but map on different chromosomes (!).

I have called TopHat2 with parameters --mate-inner-dist = -139, --mate-std-dev = 50. Unless I misunderstand something about the definitions of the terms, could it be that a negative mate-inner-dist messed something up?

I think that a read "mapped in proper pair" is the same as "concordant alignment". The definition of the latter is:

A pair that aligns with the expected relative mate orientation and with the expected range of distances between mates is said to align "concordantly".

These are two reads out of the mapped file :

A01056:33:HF3NFDSXY:1:2516:13657:30718  435     1       91387362        0       117M    21      8218147 0       CCTGTGGTAACTTTTCTGACACCTCCTGCTTAAAACCCAAAAGGTCAGAAGGATCGTGAGGCCCCGCTTTCACGGTCTGTATTCGTACTGAAAATCAAGATCAAGCGAGCTTTTGCC   :FF:F:FFFF:FFFFFFFFFFFFFF:FF,FFF,FFFFFF:FFF:FFFFF:FF:FF:FFFFFFF:FFFFFFFFF:FFFFFFFFF,FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:117        YT:Z:UU NH:i:20 CC:Z:=  CP:i:91387362   XS:A:-  HI:i:2


A01056:33:HF3NFDSXY:1:2516:13657:30718  371     21      8218147 0       112M    1       91387362        0       GGGCAAAAGCTCGCTTGATCTTGATTTTCAGTACGAATACAGACCGTGAAAGCGGGGCCTCACGATCCTTCTGACCTTTTGGGTTTTAAGCAGGAGGTGTCAGAAAAGTTAC        :F:FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFFFFF:FFFF::FFFFFFFFFF:FFFFFFFFFFFFFFFFFFF        AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:112        YT:Z:UU NH:i:20 CC:Z:GL000220.1 CP:i:161594     XS:A:+  HI:i:2

And this is the command used to generate the alignment

tophat --mate-inner-dist -139 --mate-std-dev 50 -o align/Sample10 -G /.../Homo_sapiens/Ensembl/GRCh38/Annotation/Genes/genes.gtf -N 10 --read-gap-length 5 --read-edit-dist 15 --segment-length 20 --read-realign-edit-dist 3 --no-coverage-search --library-type fr-firststrand -p 32 /.../Homo_sapiens/Ensembl/GRCh38/Sequence/Bowtie2Index/genome processed/Sample10_R1_clean_pe.fastq.gz processed/Sample10_R2_clean_pe.fastq.gz,processed/Sample10_R1_clean_se.fastq.gz,processed/Sample10_R2_clean_se.fastq.gz

( _pe files are for paired reads. _se files were also sequenced paired end; but during the pre-processing cleaning part, only one of the pair of reads remained)

RNA-Seq TopHat2 paired-end • 1.3k views

ADD COMMENT • link updated 3.9 years ago by Biostar 20 • written 4.1 years ago by Aspire ▴ 370

0

Entering edit mode

FYI : https://twitter.com/lpachter/status/937055346987712512?lang=fr

ADD REPLY • link 4.1 years ago by Nicolas Rosewick 11k

0

Entering edit mode

Thanks, but nevertheless I'd still be glad if anyone can help me with this issue.

ADD REPLY • link 4.1 years ago by Aspire ▴ 370

0

Entering edit mode

In the majority of the cases for this file however, the error is not the sam file flag, but the YT:Z:UU flag.

In this run, tophat has received both PE and SR reads. About 98% were PE. Despite that (subsampling the file), about 98% are mapped with the YT:Z:UU flag.

This Is There An Explanation For This Tophat "Yt" Descriptor Discrepancy In My Sam Output?

was on a similar topic.

ADD REPLY • link 4.1 years ago by Aspire ▴ 370