Entering edit mode
10.6 years ago
jack
▴
980
I have collected gene expression data for one of the cancer and I want to infer gene regulatory network out of it.
Before reconstructing network, I want to make sure that my data is good enough.
Wich kind of statistical tests I can do to have feeling that my data is good enough. I'm thinking about T-test.
All of my data come from Breast cancer. I want to check the quality of data before using it. is it good idea to have plot variance, correlation, t tests for every gene or transcripts in different samples from same diseases ?
try out the arrayqualitymetrics in bioconductor to assess the quality of your array data. You might also want to do a PCA / dendogram plot to look at the variance of the data. Boxplots and density plots will tell you how well the data normalised too.
But my expression data come from NGS not microarray. I think the metric used in microarray is not applicable to RNAseq data.
What pipeline are you using to analyse the RNA Seq data?
I don't use any pipeline. I'm trying to learn an statistical model
ok, if you want to learn about the statistical models used for normalising RNA seq data, check out http://cufflinks.cbcb.umd.edu/manual.html#cuffnorm
Cuffnorm is the program from the tuxedo pipeline used for normalising RNA seq datasets. If you look in the manual, there are a few parameters for tailoring normalisation http://cufflinks.cbcb.umd.edu/manual.html#library_norm_meth
That might give you an indication of where to start
Basically my concern is to do carry out some statistical tests to check the quality of RNA-seq data, before giving it to my model.
Now the question for me is that, which kind of statistic tests shows the quality of data? my data belong to one cancer type.
This is why using a pipeline would help, especially in something like the Tuxedo pipeline, as they include an R package (CummeRbund), which provides extensive QC tools. Also, they provide Cuffnorm (which I mentioned earlier), which is the normalised expression set.
If I were you, I'd run Tophat, Cufflinks, Cuffmerge, Cuffdiff. Load the Cuffdiff output into R and check out the QC tools in there, that will show you how good your data actually is. Then you can use Cuffnorm to get the normalised expression set out and throw that into your model.