Question

[R] Unsupervised hierarchical clustering RNA-seq ?

0

Entering edit mode

2.6 years ago

ali ▴ 20

Hi everyone , I recently finished to replicate a work made on RNA-seq data on genes involved in Tumor Educated Platelet. Now at the very end of the article (from which I replicated the analysis) they mention the following : Unsupervised hierarchical clustering was performed by Ward clustering and Pearson distances. Non-random partitioning, and corresponding p value, of unsupervised hierarchical clustering was determined using a Fisher’s exact test. Now I was wondering how do I compute this final part? What I have is a count matrix ( non-normalized and normalized) with Cancer and Control samples (285 in total) as columns and 5000 genes as rows, and a factor of 2 ( 1 = tumor , 0 = HC ). I have another factor too of 6 that specifies even the specific tissue of the tumor. For what I got this is useful in order to give more significance to the clustering , but I really don't know how to do this in R.

RNA-seq R fisher Counts • 2.1k views

ADD COMMENT • link 2.6 years ago by ali ▴ 20

0

Entering edit mode

Ali, check this tutorial on unsupervised gene expression clustering, you may find answers to most of your questions by walking through that post.

ADD REPLY • link 2.6 years ago by Hamid Ghaedi 3.3k

0

Entering edit mode

Tnx I will give it a look.

ADD REPLY • link 2.6 years ago by ali ▴ 20

score 0 · Answer 1 · 2022-05-09

I can't help you with the second part, but this is at least some sample code which demonstrates a simple hierarchical clustering in R:

sampledata <- matrix(runif(1e3,0,100)^2,ncol=10,nrow=100)
colnames(sampledata) <- paste("Sample",LETTERS[1:10],sep="_")

  library("stats")

  thematrix <- as.matrix(t(sampledata))

  # calculate distance
  thematrix.dc <- dist(thematrix, method="euclidian")
  thematrix.hc <- hclust(as.dist(thematrix.dc), method="ward.D2")

  cpm_cluster <- list()
  cpm_cluster[["hc"]] <- thematrix.hc
  cpm_cluster[["ord"]] <- rownames(thematrix)[order.dendrogram(as.dendrogram(thematrix.hc))]

  library("ape")
  library("ggplot2")
  library("ggtree")
  library("grid")
  library("gridExtra")

  layout <- c('rectangular','slanted','fan','circular','radial','unrooted')

  mylayout <- layout[layout.style]
  if(is.na(mylayout)){mylayout <- layout[1]} # fallback in case invalid layout was chosen

  # convert to phylo class tree (ape)
  # build tree with layout
  myphylo <- as.phylo(cpm_cluster[["hc"]])
  mytree <- ggtree(myphylo,
                   layout=mylayout,
                   ladderize=FALSE,
                   color="black")

  mytree[["data"]][["label"]] <- gsub("_"," ",mytree[["data"]][["label"]])

  mytree <- mytree + geom_tiplab(aes(label=label),color="black", size=3, vjust=-0.2, hjust=1.15)
  mytree <- mytree + geom_tippoint(size=3.5,shape=15) + geom_nodepoint(color="black",size=2)
  mytree <- mytree + scale_color_brewer("Group",palette = "Set3") 

  print(mytree)

I have to admit that I only know Pearson correlation, but not Pearson distances, so in this example Euclidean distances are used. Have a look at tree annotation if you like to color your tips according to the tumor/HC groups.