Question

Statistical tests for sanity check of expression data before inferring network

1

Entering edit mode

11.1 years ago

jack ▴ 990

I have collected gene expression data for one of the cancer and I want to infer gene regulatory network out of it.

Before reconstructing network, I want to make sure that my data is good enough.

Wich kind of statistical tests I can do to have feeling that my data is good enough. I'm thinking about T-test.

sequencing genome • 4.7k views

ADD COMMENT • link updated 3.7 years ago by Ram 45k • written 11.1 years ago by jack ▴ 990

Ram · Answer 1 · 2014-04-22

1

Entering edit mode

11.1 years ago

andrew.j.skelton73 6.6k

Student's T-Test and Benjamini Hochberg multiple test correction are the traditional means of statistically testing the robustness of expression data

ADD COMMENT • link 11.1 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

All of my data come from Breast cancer. I want to check the quality of data before using it. is it good idea to have plot variance, correlation, t tests for every gene or transcripts in different samples from same diseases ?

ADD REPLY • link 11.1 years ago by jack ▴ 990

1

Entering edit mode

try out the arrayqualitymetrics in bioconductor to assess the quality of your array data. You might also want to do a PCA / dendogram plot to look at the variance of the data. Boxplots and density plots will tell you how well the data normalised too.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 11.1 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

But my expression data come from NGS not microarray. I think the metric used in microarray is not applicable to RNAseq data.

ADD REPLY • link 11.1 years ago by jack ▴ 990

0

Entering edit mode

What pipeline are you using to analyse the RNA Seq data?

ADD REPLY • link 11.1 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

I don't use any pipeline. I'm trying to learn an statistical model

ADD REPLY • link 11.1 years ago by jack ▴ 990

1

Entering edit mode

ok, if you want to learn about the statistical models used for normalising RNA seq data, check out http://cufflinks.cbcb.umd.edu/manual.html#cuffnorm

Cuffnorm is the program from the tuxedo pipeline used for normalising RNA seq datasets. If you look in the manual, there are a few parameters for tailoring normalisation http://cufflinks.cbcb.umd.edu/manual.html#library_norm_meth

That might give you an indication of where to start

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 11.1 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

Basically my concern is to do carry out some statistical tests to check the quality of RNA-seq data, before giving it to my model.

Now the question for me is that, which kind of statistic tests shows the quality of data? my data belong to one cancer type.

ADD REPLY • link updated 3.7 years ago by Ram 45k • written 11.1 years ago by jack ▴ 990

0

Entering edit mode

This is why using a pipeline would help, especially in something like the Tuxedo pipeline, as they include an R package (CummeRbund), which provides extensive QC tools. Also, they provide Cuffnorm (which I mentioned earlier), which is the normalised expression set.

If I were you, I'd run Tophat, Cufflinks, Cuffmerge, Cuffdiff. Load the Cuffdiff output into R and check out the QC tools in there, that will show you how good your data actually is. Then you can use Cuffnorm to get the normalised expression set out and throw that into your model.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 11.1 years ago by andrew.j.skelton73 6.6k

Ram · Answer 2 · 2014-04-22

0

Entering edit mode

11.1 years ago

andrew.j.skelton73 6.6k

Post updated above.

ADD COMMENT • link updated 5.4 years ago by Ram 45k • written 11.1 years ago by andrew.j.skelton73 6.6k