classifying samples by TCGA signature
1
1
Entering edit mode
4.4 years ago

Hi all,

I have some RNA-seq samples from multiple glioblastoma tumours that I'm now trying to classify according to a specific gene signature (from Verhaak et al., 2010) using R. The gene signature is reported as a gene list with specific centroids for each of the 4 clusters (https://api.gdc.cancer.gov/data/941f81a1-05d7-4f84-80ec-534b8dc1ebac). I'm wondering how I can use this signature to classify my samples in R? Would it involve some sort of k-nearest neighbours method?

Additionally, the signature was identified using microarray data, but I am classifying RNA-seq data. Is there any sort of adjustment I should make to the signature to account for this?

Thanks in advance!

R RNA-Seq • 947 views
ADD COMMENT
0
Entering edit mode

Hi garrettbullivant, You may already have seen this paper but if not for your last part of the question, this paper might help. They took the RSEM rnaseq and microarray values and standardized it and centered around mean.( I personally have not done this yet although I am still trying to recreate some parts from this paper as an exercise. But when I saw this blog I thought it might help you!) Description of thiscan be found in the BRS section of the supplement. DOI: 10.1158/1078-0432.CCR-18-2953 https://clincancerres.aacrjournals.org/content/25/10/3141 Comprehensive Genetic Characterization of Human Thyroid Cancer Cell Lines: A Validated Panel for Preclinical Studies

IƱigo Landa, Nikita Pozdeyev, Christopher Korch, Laura A. Marlow, Robert C. Smallridge, John A. Copland, Ying C. Henderson, Stephen Y. Lai, Gary L. Clayman, Naoyoshi Onoda, Aik Choon Tan, Maria E.R. Garcia-Rendueles, Jeffrey A. Knauf, Bryan R. Haugen, James A. Fagin and Rebecca E. Schweppe

ADD REPLY
1
Entering edit mode
4.4 years ago

If they have reported centroids, then I imagine that they have used PAM (partitioning around medoids) clustering, and not k-means or k-NN, but you can check the citation. So, you could, in effect, simply subset the TCGA GBM samples for these genes and then try to identify ideal clusters via various metrics, like:

  • Jaccard Index
  • silhouette method
  • consensus clustering
  • elbow method
  • gap statistic

Once you identify the ideal number of clusters, k, you would then re-perform PAM on the TCGA GBM data with the identified value of k. The idea would be that the original groups identified by the authors will be 'un-earthed' in this way.

You could also simply do hierarchical clustering with the subset GBM data and define a tree-cut height to identify the original groups.

Many different ways to do it - some more elaborate ways likely exist.

Additionally, the signature was identified using microarray data, but I am classifying RNA-seq data. Is there any sort of adjustment I should make to the signature to account for this?

Then this will be a good test of the signature.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 1673 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6