Question

paired-end alignment with tophat2 expected (mean) inner distance between mate pair

0

Entering edit mode

7.4 years ago

kpr ▴ 80

How do I find the expected (mean) inner distance between mate pair?

I am having an issue with tophat and I think it is because I have 150bp paired end data and I didn't change the -r argument for tophat.

tophat -r ???? ref_genome file-1.fastq file-2.fastq

I've been looking at this post:

paired-end alignment with tophat2

I have my QC report, but I am not sure what I am looking at.

RNA-Seq • 1.7k views

ADD COMMENT • link updated 7.4 years ago by Carlo Yague 9.0k • written 7.4 years ago by kpr ▴ 80

0

Entering edit mode

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

Please stop using Tophat https://t.co/Es4ohxOEyx Cole and I developed the method in *2008*. It was greatly improved in TopHat2 then HISAT & HISAT2. There is no reason to use it anymore. I have been saying this for years yet it has more citations this year than last #methodsmatter
— Lior Pachter (@lpachter) December 2, 2017

ADD REPLY • link 7.4 years ago by WouterDeCoster 48k

score 0 · Answer 1 · 2018-01-09

You can infer the mean inner distance between mates in different ways:

1. Experimental: The inner distance depends directly on the fragments sizes of the cDNA library. This can be measured by running a bioanalyzer chip on it. You can then estimate the inner distance as Inner_Dist = Fragment_Size - (Adapters + Index) - (2x Read_Length)

2. In silico: After mapping the reads (using default -r value for instance),you can infer the mean inner distance as the average genomic distance between two concordantly mapped reads. This value can be extracted from the 9th field of the SAM files (TLEN) or using Picard CollectInsertSizeMetrics.