Entering edit mode
8.9 years ago
kanwarjag
★
1.2k
I have RNAseq dataset where in some of the samples mapping to exons is >70 % while in others it is 10-25 % I was wondering how best we should be analyzing such data for differential expression. Should we divide it into two different sets and call DE and then combine or should we analyze all as one set. My worry is If we analyze all as one set probably data will be skewed . So what should be the best approach.Are there any packages or tools to deal with such problem.
Thanks
hi,
Have you looked into what is different in those samples with low algn rate? Do they have total reads in similar range to rest? Is this anyhow total RNA seq.? (rRNA skewing maybe..)
If you can afford to leave out samples, you can of course not bother and take the behaving ones only. Btw, if you use BioC packages like DESeq or Voom (part of Limma) [or other packages mentioned in the paper] to normalize the read counts, see how those errants sample appear. If they stand out from others post-normalization then probably leave out.
Total reads to begin with are close I would say difference of <25%. I have used Deseq, EdgeR but they stand out.