Why Tophat Does Not Map Some Reads
0
1
Entering edit mode
11.3 years ago
alteralex ▴ 40

Hi, I am trying to use Tophat for my RNA-seq analysis. I tested my PE reads following three protocols.

1, no reference gene annotation (GTF file) I noticed some PE reads are correctly mapped. The flags are 83 and 163. See an example as the following:

M01339:30:000000000-A42G7:1:1101:15690:1356 163 chr8 126142440 0 32M = 126142503 96TACAGCACCCGGTATTCCCAGGCGGTCTCCCA $$$$$%%%&&&&"$%&(((((('%'( AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:32 YT:Z:UU NH:i:9 HI:i:8

M01339:30:000000000-A42G7:1:1101:15690:1356 83 chr8 126142503 0 33M = 126142440 -96GCTTCCGAGATCAGACGAGATCGGGCGCGTTCA '''&''((%'$'''#'''#&&#&"&&$'&'$"" AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:33 YT:Z:UU NH:i:9 HI:i:8

2, I provide GTF file downloaded from iGenome. The other parameters are the same, but then some of the PE reads lost its pairing in the mapping:

M01339:30:000000000-A42G7:1:1101:15690:1356 89 chr8 126142503 0 33M * 0 0 GCTTCCGAGATCAGACGAGATCGGGCGCGTTCA '''&''((%'$'''#'''#&&#&"&&$'&'$"" AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:33 YT:Z:UU NH:i:20 HI:i:19

Please notice that here the read that once mapped to chr8:126142440 lost its mapping!

3, I added another option, "--library-type fr-unstranded". This time, the two mapping are both gone! The following is found in the umapped file.

M01339:30:000000000-A42G7:1:1101:15690:1356 69 * 0 255 * * 0 0 TGAACGCGCCCGATCTCGTCTGATCTCGGAAGC ""$'&'$&&"&#&&#'''#'''$'%((''&'''

M01339:30:000000000-A42G7:1:1101:15690:1356 133 * 0 255 * * 0 0 TACAGCACCCGGTATTCCCAGGCGGTCTCCCA $$$$$%%%&&&&"$%&(((((('%'(

Anyone could give me some insights? Thank you in advance!

tophat gtf • 2.5k views
ADD COMMENT

Login before adding your answer.

Traffic: 1602 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6