I have an RNASeq sample with mouse data that started with a very low amount of RNA. We did a test run on the MiSeq, trimmed the reads to 75bp, and ran it through TopHat just to see how it's looking.
For the most part it seems fine (no bias to 3' end of genes, 75% mapping rate (low but not terrible)), but then of the aligned reads, 26% are discordant, leaving me with a total overall concordant mapping rate of 50%. Is this okay? What could cause this high-ish discordance rate, and should I worry about it? I can't seem to find concordant alignment rates in a lot of the RNASeq literature, so I'm not sure if this is something a reviewer would wonder about.
Thanks!
What is the insert size distribution?
Are your discordant reads overlapping? If you started with very little RNA, you might have shorter cDNA fragments after fragmentation/tagmentation/whatever. You can check your insert size either on a Bioanalyzer or computationally using something like Picard or FastQC.
It may be. The fragment sizes are ~300bp including adapter (so ~150 without). But the other samples have the same size distribution and don't have this problem.
Is there a quick way to check if discordant reads are overlapping or do I need to write a script?
I used bamUtil's clipOverlap to create a file of only the overlapping reads, and used samtools flagstat to count them. The percentage of overlapping reads in this sample is not greater than in other samples, so I don't think that's the issue.
It's not clear from your post if you ran FastQC, and if you did, what the output is. This should be the starting point of any quality control.