Hello all,
I am using the Kallisto/Sleuth workflow to analyze some data following this tutorial (https://pachterlab.github.io/sleuth_walkthroughs/trapnell/analysis.html). I was able to successfully complete the tutorial and obtain outputs from Sleuth which I then observed through the R shiny web app interface. For now I'm just analyzing 12 samples (comparing 6 samples with condition A to 6 samples with condition B).
However, some of the outputs from Sleuth look bizarre and not what I expected, namely the mean-variance plot and volcano plot (I've attached images to this post). I'm also noticing that the qvalues for large groups of transcripts are exactly the same which is creating the horizontal or cutoff line in the volcano plot where all of the transcripts are congregating. I asked around about this before and was told that it is probably a processing or normalization error. How can I check for this? Is it something to do with the input data from Kallisto or is there something I can tweak in the Sleuth R script?
I'm pretty new to all this - I'm learning as I go so any feedback is much appreciated! Thanks in advance.
Did you perform QC of your FASTQ files (e.g. with fastqc)?
What was your kallisto output (e.g. how many reads were processed and successfully pseudoaligned)?
Hi Delaney - I did perform QC followed by quality trimming/filtering with the FastX toolkit. The number of process reads ranged from 20 to 40 million per samples and the number that were pseudoaligned ranged from 500,000 to 1.5 million. Pseudoalignment was not great because I'm working with an non-model organism so pseudoalignment rates ranged from 20-35%. Some rates were very low (below 5%) so I'm considering removing these samples from Sleuth analysis.
With this knowledge, do you believe it's these low pseudoalignment rates (particularly the ones below 5%) that may be skewing my data? Another grad student did a similar experiment with the same species and had similar pseudoalignment rates around 30% but managed to obtain some meaningful and reliable data/results.
Thanks!
I think it's definitely possible that the pseudoalignment rates <5% could be messing things up; with such low rates you may not be aligning to anything "biological" so your data could be just noise. I'd recommend trying just the samples with higher pseudoalignment rates.
Thank you, Delaney! I removed the samples with low pseudoalignment rates this time around and the volcano and mean-variance plots look a lot more normal.