Hi
I have been doing some bioinformatic analysis of TCGA data as an adjunct to my PhD. I am wanting to take this a little further and analyse RNASeq expression data according to clinical parameters also, such as disease stage, age and sex, histological markers of invasion etc. Has anyone done this analysis? What sort of statistical analyses do you use for this? I am assuming that performing ANOVA analysis is not appropriate with such a large dataset with so many multiple comparisons being made across the dataset? Bioinformatic statistics is a very new area to me.
Also could anyone recommend packages to do this type of analysis? I have started using the amazing R studio and TCGAbiolinks but there doesnt appear to be a package in the Bioconductor guide that is suitable for this.
Really grateful for any advice guys :-)
Hi elizabethR I am not very familiar with TCGA data but If you want to do a class comparison test between two or more phenotypes ,first you should preprocess your data(including log transformation,summarization,normalization),if you want to use GEO data,I offer you using InsilicoDB https://insilicodb.com to retrieve preprocessed data, and it provides you a pipeline to do your job simply.In addition limma is a robust package for analyzing gene expression data it could create a liner model for finding markers of each phenotype https://bioconductor.org/packages/release/bioc/html/limma.html