I am using the Rtsne package to to perform cell-clustering of single-cell RNA-seq data. I first take my raw counts, normalise them by library size, and identify top 1000 highly variable genes. I save these genes, and perform clustering by t-sne. The resultant plot forms 10 distinct clusters. I have been successful in obtaining this plot - but I need to visualise the distribution of individual gene expression within the clusters obtained by t-sne? Can anyone suggest a website/online tutorial on how to do this?
Essentially I would like to create a figure as a presented in Figure 1b of this paper: https://www.nature.com/articles/nature20105
At the minute I have the following code:
#t-sne of top 1000 variable genes in my dataset
library("Rtsne")
tsne <- Rtsne(t(genes1000)) # genes1000 are the top 1000 genes (TPM) in my dataset
#color points by group (10 clusters)
plot(tsne$Y, col=c("purple","orange","blue","forestgreen","darkgrey","yellow","red","maroon","skyblue","brown")[branch], bg=c("purple","orange","blue","forestgreen","darkgrey","yellow","red","maroon","skyblue","brown")[branch], pch=21, main="", xlab="t-sne[,1]", ylab="t-sne[,2]")
par(cex=0.8)
legend("bottomleft", legend=c("Group 1","Group 2","Group 3","Group 4","Group 5","Group 6","Group 7","Group 8","Group 9","Group 10"), fill=c("purple","orange","blue","forestgreen","darkgrey","yellow","red","maroon","skyblue","brown"), border=FALSE)
#log transformation of TPM values
log <- log2(genes1000 +0.001)
There was a recent question here, which you may find of use: Rtsne plot labelling
Kevin
Thank you Kevin -
But I still can't label my plot by gene expression :(
I'm current remotely-based with no access rights to journals; however, Google found the figure for me and broke through the access permissions. Can you confirm that it's this figure: https://media.nature.com/full/nature-assets/nature/journal/v539/n7627/images/nature20105-sf5.jpg
Figure 1b is just a violin plot? It looks like they have taken the sample-to-cluster (tSNE cluster) assignment, and then just plotted the normalised expression values. If you want to generate a violin plot, then take a look at A: Hierarchical Clustering in single-channel agilent microarray experiment
You may have to supply your own names to each cluster based on what you believe they represent. The tSNE algorithm will just regard them as cluster 1, cluster 2, etc.
Hello Kevin - thank you for your help. That is not figure 1b. This is: https://media.nature.com/lw926/nature-assets/nature/journal/v539/n7627/images/nature20105-f1.jpg
I see - thanks. You would have to get the expression values for your gene of interest in each cell, and then colour these expression values with something like:
upload pic
In this example. 'numbers' would contain your expression values for your gene of interest.
I have added the code I have so far....any advice would be great, thanks.
Thanks for adding. So, you need to extract the expression values for your gene of interest from
genes1000
, colour these expression values in a gradient using the code that I posted above, and then supply this colour vector toplot()
. The sample ordering ingenes1000
would have to match that oftsne$Y
, though.Thank you for this -