Entering edit mode
9 months ago
unkown
•
0
Hello all,
I am in the process of moving from Deseq2 to Sleuth for all my bulk RNAseq analysis. The biggest question that I have is how do I plot a PCA plot using variance instead of counts with Sleuth results?
I started by using the plot_pca
function. This one however, shows the read counts, I also am not sure how to read this data.
Method 1: plot_pca
+ sleuth
so = sleuth_fit(so, ~sampletype, fit_name = "full")
so = sleuth_fit(so, ~1, fit_name = "reduced")
so = sleuth_lrt(so, null_model = "reduced", alt_model = "full")
res = sleuth_results(so, test = "reduced:full", test_type = "lrt", show_all = TRUE)
plot_pca(so, color_by = "sampletype", text_labels = TRUE, units = "scaled_reads_per_base") +
geom_point(size=14, pch=0.5)+
theme_bw()+ theme(axis.title.x = element_text(face = "bold", size=20),
axis.title.y = element_text(face = "bold", size=20),
axis.text.x = element_text(face="bold", color="#000000", size=20),
axis.text.y = element_text(face="bold", color="#000000", size=20),
legend.title=element_text(face="bold", size=5),
strip.text.x = element_text(size = 18),
strip.text = element_text(size=10),
strip.placement = "outside")
The other alternative is to extract the read count matrix and plot it using prcomp and ggplot2.
Method 2: prcomp
+ ggplot
norm_counts <- sleuth_to_matrix(so, "obs_norm", "scaled_reads_per_base")
log_norm_counts <- so$transform_fun_counts(norm_counts)
pc <- prcomp(t(log_norm_counts))
plot2_pca <- data.frame(pc$x, s2c)
ggplot(plot2_pca, aes(PC1, PC2)) +
geom_point(aes(color=sampletype),size=14, pch=0.5) +
xlab('PC1') + ylab('PC2') +
geom_text_repel(aes(label=sample)) +
theme_bw() +
theme(axis.title.x = element_text(face = "bold", size=20),
axis.title.y = element_text(face = "bold", size=20),
axis.text.x = element_text(face="bold", color="#000000", size=20),
axis.text.y = element_text(face="bold", color="#000000", size=20),
legend.title=element_text(face="bold", size=5),
strip.text.x = element_text(size = 18),
strip.text = element_text(size=10), strip.placement = "outside")
Questions:
- What am I doing wrong with method 2? Why do my plots look so different, especially, the PGB1 samples? In method 1, the two PGB1 samples are close together, while in method 2 they show a great deal of separation?
- Is there a way to plot the variance using
plot_pca
? I haven't come across any during all my searches.
Thank you!
Nothing, this is how PCA is commonly done. The sleuth axis limits look extremely odd. Any reason to use sleuth? It did not convince in the benchmark papers I recall, plus it is mainly for transcript-level DE as it can use the inferential replicate information from kallisto.
Hello,
Thank you for your reply. I used salmon with bootstraps and its advised to use Sleuth if I want to make use of my bootstrap information. I typically use Salmon counts + Deseq2, but wanted to switch over because sleuth is recommended.
Would you suggest I stick to Salmon + Deseq2, rather than Sleuth?
Could you also link the benchmark papers you were talking about? Most of the papers that I came across say that Sleuth is just as good as Deseq2 in most cases, with the exception of low abundance genes.
For gene level differential expression I would indeed stick to the most established tools, be it DESeq2, edgeR or limma. They all do not use inferential information. It is not common to use it on gene level, since the main mapping ambiguity is between transcripts of the same gene on transcript-level. Don't make it too fancy, do what everyone does unless there is a good reason not to in your particular case. After all, sleuth does not seem to be maintained, it has almost 100 open issues on GitHub https://github.com/pachterlab/sleuth with no commit in the last two years, while other mentioned tools are still under active maintenance with authors outstandingly responsive on the Bioconductor support site and here. Your choice in the end.
If you really need the inferential information on gene level I would use https://bioconductor.org/packages/release/bioc/vignettes/fishpond/inst/doc/swish.html#Differential_gene_expression. THis tool is a collab between the salmon authors and the DESeq2 people, so this is probably the best choice in terms of making full use of what salmon offers.
Ah yeah, thank you for the confirmation. I wasn't sure if I should switch over. I did need to do transcript level analysis too. but i think ill do as you have suggested: Deseq2 for gene level and Swish for transcript level.
Hope you have a great day :)
A few things: sleuth is used for either gene-level DE or transcript-level DE, not just transcript-level DE (in fact, I used it for gene-level DE in my recent paper: https://www.nature.com/articles/s41388-022-02458-9 ). sleuth is an established tool for gene-level DE and has performed well on benchmarks, but it's true it would be harder to find support for sleuth (Edit: I just realized that I was the one who made the most recent commit for sleuth so mea culpa).
Making use of bootstraps tends to perform better, whether sleuth, swish, or edgeR. In fact, I recommend using edgeR. edgeR has a "catchKallisto" and a "catchSalmon" feature to leverage that information -- it has performed the best on the most recent benchmarks (see Gordon Smyth's most recent paper).
That said, I've used DESeq2 in the past and it works great -- the loss of leveraging the bootstrap information hasn't prevented me from making relevant biological discoveries :)