Question

paired-end alignment with tophat2

0

Entering edit mode

9.6 years ago

zizigolu ★ 4.4k

Hi,

I have 12 paired-end fq files.

In tophat2 protocol this is the syntax

tophat -p 8 -G genes.gtf genome file-1.fq file-2.fq

but I read something about r option in paired-end case like below

tophat -r 200 file-1.fq file-2.fq

I am going to start using tophat, then please someone tell me if I should use -r or not? If I should use, then why in protocol Trapnell et al. didn't mentioned anything about -r?

Thank you

RNA-Seq tophat next-gen-sequencing • 7.7k views

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 9.6 years ago by zizigolu ★ 4.4k

Ram · Answer 1 · 2016-02-10

2

Entering edit mode

9.6 years ago

Daniel ★ 4.0k

The manual clearly says:

-r/--mate-inner-dist <int>

This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. The default is 50bp.

Only you can know what the insert between your reads is. It will stop forward and reverse reads which are separated by an inappropriate distance being mapped, although I don't know how strict tophat sticks to the number.

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 9.6 years ago by Daniel ★ 4.0k

0

Entering edit mode

Thank you but how I should know the length of each end or my fragments? Also I read something about comma separation but there is no comma in tophat manual

tophat [options]* <genome_index_base> <reads1_1[,...,readsN_1]> [reads1_2,...readsN_2]

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 9.6 years ago by zizigolu ★ 4.4k

1

Entering edit mode

Use the mean length from your bioanalyser QC prior to sequencing (or similar) minus the sequencing length that you used (i.e. 2x 100bp)

ADD REPLY • link 9.6 years ago by Daniel ★ 4.0k

0

Entering edit mode

Actually I don't have any information about the protocol then I used 200

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 9.6 years ago by zizigolu ★ 4.4k

1

Entering edit mode

I think you need to find out about how your experiment was made before you do any more analysis then, otherwise the results could be useless.

For example, if you sonicated your samples to 300bp and did 150bp Paired End sequencing your insert (-r) is 0.

If you sonicated your samples to 600bp and did 75bp Paired End sequencing your insert (-r) is 450.

You MUST know this before just choosing a number at random. If you don't have information on the protocol, find out the information.

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 9.6 years ago by Daniel ★ 4.0k

0

Entering edit mode

Thank you, I received an email from company, the insert size was 248-277 bp then I used 200. Did I do something wrong?

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 9.6 years ago by zizigolu ★ 4.4k

1

Entering edit mode

You don't need me to tell you that 200 does not equal 248-277. However I don't know how strictly tophat uses the parameter. You'll have to read up on that to see if it could affect your alignment.

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 9.6 years ago by Daniel ★ 4.0k

0

Entering edit mode

Then when the distribution of my insert size between samples is 248-277, what you suggest me to set?

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 9.6 years ago by zizigolu ★ 4.4k

1

Entering edit mode

Apologies if this is blunt, but please read the manual and stop asking me to read the manual for you.