Question

Toy Tophat Example, finding a single junction from synthetic data

0

Entering edit mode

9.7 years ago

irritable_phd_syndrome ▴ 130

I am trying to get a better understanding of how tophat works. To do this I have created some toy examples. I am looking at a splice junction between two exons in chr9 of hg38. Exon 1 and exon 2 are at base positions 133261900 - 133262400 and 133274600 - 133276150 respectively. Below is my synthetic data

Read 1 (R1.fastq):

@exon 2, line 04...
ATTATTAGGAAAAGGATCATAGGTCGAAGTGCGTGGCATTTTGGTTTTCC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@near exon 1...
GCCTGTTGCCCAGGTGAGGCTAAAATAAGGGAGCAAGTAAGCACACCCTC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Read 2 (R2.fastq):

@exon 1, line 05-06...         # starts here
GGCTCCGCGTCTGGTCTCGGCCTCCGCCTTCCGCCCGGCGGCCCTGGGGC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@near exon 1
CGCCAGAGCTGGACGCAGGCAATAACTGTCCTTAGGACCCTGATAACTGT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

The gene is on the opposite strand with respect to the reference and I am trying to emulate paired end reads. The first entry of R1.fastq is part of exon 2 and first entry of R2.fastq is part of exon 1. The result of running

tophat2 -p 2 -o . --library-type fr-unstranded /path/to/hg38/Bowtie2Index/genome R1.fastq R2.fastq

yields 2 discordant pair alignments, but no junctions. Replacing the first line of R2.fastq with its reverse complement yields 2 concordant pair alignments, but no junctions.

Question : How would I modify my data set to get tophat to recognize a junction?

RNA-Seq tophat • 2.2k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 9.7 years ago by irritable_phd_syndrome ▴ 130

score 0 · Answer 1 · 2015-10-31

0

Entering edit mode

9.7 years ago

Devon Ryan 105k

Tophat (and similar aligners) identify junctions by finding individual reads that are spliced. That is, the splicing needs to occur somewhere in the middle of the read. So, just modify one of the reads to be longer and overlap both exon 1 and 2.

ADD COMMENT • link 9.7 years ago by Devon Ryan 105k