I'd like to use the RNA-seq data from CCLE found at https://portals.broadinstitute.org/ccle/data. Where may I find more information about the methods that were considered to process the data?
In particular, look at Supplementary Material 1 (Word doc) - it has information on each of the following:
Supplementary Methods
1. Cell line selection and annotation
2. DNA and RNA extraction
3. Genomic characterization
SNP Arrays:
DNA identity analysis by mass spectrometric SNP genotyping
4. Cell line identity validation by SNP genotyping
5. Gene set activity scores
6. Cell line-to-primary tumor comparison
Copy-number comparison:
Expression comparison:
Mutation-rate comparison
7. Pharmacological characterization
8. Prediction of drug response
8.1. Sensitivity prediction using regression analysis
8.2. Sensitivity prediction using categorical analysis
9. AHR validation experiments
Cell lines and cell culture conditions
Lentivirally delivered short hairpin RNA
Immunoblot analysis:
Analysis of mRNA expression by quantitative RT-PCR (qRT-PCR)
Growth curves
Pharmacologic growth inhibition curves
10. Data sharing/release
All raw and processed data are available at the CCLE website: www.broadinstitute.org/ccle In addition, the website offers direct links to data visualization tools such as IGV39, as well as genepattern-based40 analysis tools for expression and copy-number class comparison analyses.
Ola'! Many thanks for your response! Unfortunately, they do not mention the procedures for RNAseq processing, and perhaps they have changed them in the last 6 years?
I see, but those authors appear to have done the work on their own. That is, you extract the PAM50 genes from the data and then perform some clustering analysis in order to see how your samples are grouped based on the PAM50 genes. PAM50 is just a list of genes, after all. One can explore the utility of the signature in any expression dataset.
Ola'! Many thanks for your response! Unfortunately, they do not mention the procedures for RNAseq processing, and perhaps they have changed them in the last 6 years?
The RNAseq data has been published since the release of the paper.
@i,sudbery thanks. Do you have the reference for the publication?
Take a look here: CCLE RNA-seq library protocol
This is great, thanks so much!
De nada amigo.
Where can I find information on how they analyzed the RNA-seq data? Thank you!
Read the supplementary material.
Thanks, Kevin. Just for an updated reference, I found the RNA-seq analysis info as part of this paper: Next-generation characterization of the Cancer Cell Line Encyclopedia
Hi Kevin,
Where can I find CCLE Breast RNA-Seq data (not microarray) PAM50 Subtype information?
Thank you
Hello my friend. Where have you looked already? The CCLE ma not report the PAM50 subtypes.
I have seen that it is reported in this paper [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001206/#MOESM1] But it is based on microarray data. What I need is for RNAseq data.
I see, but those authors appear to have done the work on their own. That is, you extract the PAM50 genes from the data and then perform some clustering analysis in order to see how your samples are grouped based on the PAM50 genes. PAM50 is just a list of genes, after all. One can explore the utility of the signature in any expression dataset.
If you just need to know the genes, then take a look here: Where To Download Pam50 Gene Set?