Question

Statistical Distributions In Rna-Seq Data Analysis

2

Entering edit mode

12.6 years ago

Ngsnewbie ▴ 380

I came to know that RNA-seq data follows bionomial/ negative bionomial distribution. Well i am not a statistician, but i studied about basics of statistics, statistical terms, probability, distributions and statistical tests on internet.The text available on internet use coin flipping, playing cards, throwing dice type of examples which helped me to understand the statistics (well i say basic statistics) behind it .but when i come to RNA-seq data i am not able to correlate and comprehend.

Can anyone explain (or provide me a link) RNA-seq data distribution (eg. bionomial / negative bionomial) and statistical (eg. T test) test taking an example of RNA-seq count/FPKM data, where we have input parameters:

1.Number of genes in organisms

2.Number of reads mapped on these genes

Thanks in Advance :)

statistics rna • 7.3k views

ADD COMMENT • link updated 5.6 years ago by Biostar 20 • written 12.6 years ago by Ngsnewbie ▴ 380

5

Entering edit mode

I don't think you will find a derivation for why the negative binomial is used for RNA-Seq in the same way for example the binomial distribution would be used to model card games or Poisson would be good to model the number of customers per hour. In real life the number of reads counted for any gene tends to vary between individuals more than the Poisson distribution (what is usually used for count data) would model. The negative binomial is used because it is more accurately matches what is observed than Poisson. As frustrating as this sounds it is still better than microarrays.

ADD REPLY • link 12.6 years ago by Jeremy Leipzig 22k

1

Entering edit mode

read DESeq and edgeR paper. It's well explained in it

ADD REPLY • link 12.6 years ago by Nicolas Rosewick 11k

1

Entering edit mode

Look at the 5th response (by Simon Anders) in this Seqanswers forum post.

ADD REPLY • link 12.6 years ago by Arun 2.4k