Hi,
I have RNAseq data of two strains. In the WT I can see top1 gene present with FPKM of average 90. the problem is in the mutated one where top1 is deleted I still see top1 with an FPKM of average 4. Well that's too less comparing to WT but I want to know if that's possible or my experiment went wrong at some point? Please let me know if you want more information.
How were the FPKM values calculated. If you're using multimappers then that could be the problem.
I use TopHat, Trapnel et al workflow. I guess TopHat uses multimappers as a fast search I did just now. Is that correct? And is that fine if I get it?
Yeah, it'd be good to recalculate things without any multimappers. Also, if there's high enough similarity between top1 and any of the other topoisomerases then their FPKM may be spilling over onto this one (again, due to multimappers).
I did a NCBI BLAST. The highest similarity was 13 exact base pairs and then 25 somewhat similar base pairs. Would that be enough for their FPKM to spill over each other?
Perhaps. From the IGV screenshot you posted, it looks like the entirety of the signal is coming from the 3' most end. What sort of edit distance and MAPQ do the reads that map their have? If they have a number of mismatches then it's likely that you're seeing them there due to how tophat works (i.e., it maps to the transcriptome first, so it can produce somewhat biased alignments in cases like this).
BTW, was the KO made with something like Cre-Lox such that there could be a little residual RNA floating around from prior to the excision event?
How do I look up these MAPQ and edit distance for mapped reads? Some help would be appreciated.
For KO we used a two step PCR method, since we work on S. pombe.
You should be able to just
samtools view accepted_hits.bam II:2941900-2942000 | less
, or something like that with more appropriate coordinates.OK I did
samtools view accepted_hits.bam II:2941980-2941734 | less > 1.txt
and it produced a 2GB file! Is it supposed to be that big?! It includes data as:That seems wrong, though I'm surprised that anything was written to 1.txt. Try instead
Still a 2.5 GB txt file. How can it be that huge?! Any suggestion?
One more question, how do you run TopHat without multimappers? I couldn't find out by looking at the options. Thanks!
That does seem large, it would appear that something went wrong. You should be able to discern what from the file's contents.
You can't exclude multimappers, you just filter them afterward.
Devon I don't know what you mean by "You should be able to discern what from the file's contents." Please explain more.
And would be very kind of you if you could show me how to remove multimappers from my data. Would it be and advantage for the rest of analysis? I mean would it be more accurate then?
You're trying to extract alignments that cover a specific region, so if the ones in the file don't cover that region then something went amiss. Given the size of that file, my guess would be that its contents aren't really what you want.