I'm currently perfoming my master final project and I am analysing RNA-seq data of breast cancer. I need to find groups in the data, and after using different clustering methods, I would like to try PAM50 method.
I found many papers which spoke about PAM50, and "genefu" package information, but I don't find any protocol where it is explained how to perform a PAM50 study.
If someone knows where I could find it, please let me know it.
In case anyone is still looking for an answer, I had the same problem and was able to fix it. First, make sure you have an annotation dataset. They can be found in bioconductor (i.e., annot.nkis, org.Hs.egALIAS2EG). Then make sure that, if not already, this annotation dataset has the column name "EntrezGene.ID" plus whatever other columns in the dataset present such as gene_name. Now, if you are doing predictions (such as for PAM50), make sure you are using intrinsic.cluster.predict() and not intrinsic.cluster(). At the end you should have your 3 objects for your predictions: the pam50 model found in genefu (data(pam50)), your matrix that SHOULD HAVE SAMPLES IN ROWS AND GENES IN COLUMNS (this seems to be the most confusing part) and lastly the annotation data.frame with a column name "EntrezGene.ID."
Thank you so much for the information. I'm trying to performing intrinsic.cluster() function as you suggested me and I have obtained the following error:
Error in intrinsic.cluster(data = counts, annot = annot, do.mapping = FALSE, :
no probe in common -> annot or mapping parameters are necessary for the mapping process!
being annot (Matrix of annotations with at least one column named "EntrezGene.ID", dimnames being properly defined)
I'm using also genefu package to do the PAM50 prediction from RNA-seq data, my question es, should I use raw data (read counts) or should I transform my data prior the prediction?
I have a signature genes similar to pam50 from other tumor. How should I use "intrinsic.cluster" OR "intrinsic.cluster.predict()" function in genefu package to do nearest centroid classifier?
I run intrinsic.cluster() with the 3 centroid with the signature genes, BUT I got cluster.1, cluster.2 and cluster.3. How do I associate this cluster to the centroid subtype in my signature genes?
Data used to do "intrinsic.cluster())
700 samples x 20000 genes expression as column
20000 genes x 3 column of annotation
Annotation of the 500 signature genes.
I don't know how to tell incorporate a predefined subtype I have ?
Please see also: Where To Download Pam50 Gene Set?
Hi Maria, did you solve the problem you had 5 months ago? If so, would you share how? Thanks :)
I also ran into the same issue :
I have an gene expression matrix with samples as rows and genes (symbols) as columns (not probes)
I created an annotation matrix with the gene symbols and de EntrezGene_ID.
when running the intrinsic.cluster function on my data I get:
When looking at the function code this originates from
This should be false as there are genes from Pam50 present in my data as can be seen in this way :
Any help would be greatly appreciated
Please post this as a new question.