Hi all,
I am interested in evaluating four different ribosomal depletion wet lab protocols and I want to see if any of them are biased against transcripts other than rRNA. Ultimately, I want to find out which of the protocols is the least biased towards other transcripts.
I quantified all the samples with sailfish and now I have number of reads and TPM for each transcript. For each of the four different wet lab protocols I have between 6 and 30 samples.
Is it enough to just compare TPM counts for each transcript and maybe look at TPM variances, or should I do a regular DESeq differential expression analysis?
Also, unfortunately there are some samples that were only evaluated in one protocol (sorry, that's what the lab gave me...). Is it worth to even include these? If yes, should they be treated differently?
Thanks for any suggestions!
Edit: I ran SortMeRNA to remove ribosomal RNA reads before quantifying with sailfish. This is ecoli data, in case it matters.
Have you aligned the samples against rDNA repeat for your organism to estimate what fraction of reads are mapping to it?
Yes I have. I ran SortMeRNA to remove ribosomal RNA reads before quantifying with sailfish. This is ecoli data, in case it matters.
If you look at the number of rRNA reads that got removed from various samples do you see any correlation with the TPM counts you have? Ultimately getting the right DE result is the ultimate litmus test. Individual samples (not tested against the entire spectrum) won't be informative to make a decision across the tests, won't they?
Could you elaborate on how to correlate the data?
I plotted three samples below that were treated with three different rRNA depletion protocols. The third protocol removed the most rRNA (i.e. the percentage value on the 2nd y-axis is highest).