Hello,
I made two PCA plots from two types of mappings towards the same genome. In one of them, I allow the reads to just map to one loci, and in the other one I let them map 100 times. After that I do transcript assembly with StringTie in the uniquely mapped reads, and i use the software TEtranscripts for the multi mapped reads (this plot is made just with the gene read counts).
The PCA plots look pretty different: The "unique" mapping separates the two conditions pretty well, and the other one not (not at all). I was wondering if I can still trust the results from the second mapping, or what does this really mean?
Thanks!!
Uniquely mapped:
Multi mapped:
Based on which features is the PCA plot, and on how many of them, or on all genes?
The unique mapping is made with transcripts (79,000), TEtranscripts outputs gene read counts so it is less (24,333 genes).
TETranscripts is a very odd choice for this situation and I'm not sure how well it will perform given that it's meant to be used with genomic alignments in the context of a well-annotated genome (with repeats). Please use something like
salmon
instead.What organism is this?
Thank you for your answer. This is human data, and I am using the repeat information that TEtranscripts offer for other analysis.
I was expecting similar results from the two approaches and was wondering if there was a reason behind it. Even if the software was not specifically designed to only work with gene counts, I think it is relevant to make sure the conditions are being somehow differentiated if the software will perform a DEA afterwards.. Don't you think?
There's no need to perform transcript assembly with humans (don't bother with stringTie). What this suggests is that repeat expression is dominating things and not different by condition.
Thank you for your answer!