Hi, I am trying to use Tophat for my RNA-seq analysis. I tested my PE reads following three protocols.
1, no reference gene annotation (GTF file) I noticed some PE reads are correctly mapped. The flags are 83 and 163. See an example as the following:
M01339:30:000000000-A42G7:1:1101:15690:1356 163 chr8 126142440 0 32M = 126142503 96TACAGCACCCGGTATTCCCAGGCGGTCTCCCA $$$$$%%%&&&&"$%&(((((('%'( AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:32 YT:Z:UU NH:i:9 HI:i:8
M01339:30:000000000-A42G7:1:1101:15690:1356 83 chr8 126142503 0 33M = 126142440 -96GCTTCCGAGATCAGACGAGATCGGGCGCGTTCA '''&''((%'$'''#'''#&&#&"&&$'&'$"" AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:33 YT:Z:UU NH:i:9 HI:i:8
2, I provide GTF file downloaded from iGenome. The other parameters are the same, but then some of the PE reads lost its pairing in the mapping:
M01339:30:000000000-A42G7:1:1101:15690:1356 89 chr8 126142503 0 33M * 0 0 GCTTCCGAGATCAGACGAGATCGGGCGCGTTCA '''&''((%'$'''#'''#&&#&"&&$'&'$"" AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:33 YT:Z:UU NH:i:20 HI:i:19
Please notice that here the read that once mapped to chr8:126142440 lost its mapping!
3, I added another option, "--library-type fr-unstranded". This time, the two mapping are both gone! The following is found in the umapped file.
M01339:30:000000000-A42G7:1:1101:15690:1356 69 * 0 255 * * 0 0 TGAACGCGCCCGATCTCGTCTGATCTCGGAAGC ""$'&'$&&"&#&&#'''#'''$'%((''&'''
M01339:30:000000000-A42G7:1:1101:15690:1356 133 * 0 255 * * 0 0 TACAGCACCCGGTATTCCCAGGCGGTCTCCCA $$$$$%%%&&&&"$%&(((((('%'(
Anyone could give me some insights? Thank you in advance!