Hello!
I'm working with RNA-seq data (tumor tissues sampled from female and male patients). After mapping, only 29-55% of reads are assigned to exons, and 39-60% are unassined-MultiMapped. Is this reasonable for RNA-seq data?
I also checked the number of reads mapped to each chromosome. Some of the samples have very high mapping on chromosome M, 21, 7, 1 and 22. I thought this has to do with the nature of tumors (in this case colon) and the fact that mitochondrial DNA alterations have been widely reported in many tumors. Am I correct or there's something wrong with my data or analysis?
I do appreciate your kind help.
60% unassigned/multi-mapped reads is a little high. You can try to check for DNA contamination with
read_distribution.py
from RSeQC (more hits in non-coding regions if present).I forgot to mention that I tried read distribution, too, in which about 48% of the reads are mapped to CDS-Exons, and the other half mostly mapped to 3UTR-Exons, and introns. And some to TSS-up-5kb and TSS-up-10kb!
So, you don't think this is typical for tumor data?
If 52% of the reads are aligning to regions other than CDS-Exons, especially introns, then I would suspect there may be DNA contamination. Most of the downstream analyses should be okay assuming you have enough reads mapped to CDS-Exons.
Considering that you have RNA-Seq data, I would say it's not typical regardless of tissue type.
Thank you very much. I'm just gonna do the DE analysis and see what the results suggest.
I will take these into account.
Best regards