Question

Check if RNASeq count data follow NB or Poisson distributions

0

Entering edit mode

8.5 years ago

debitboro ▴ 270

Hi Biostars,

I have used HTSeq to generate the following table for counting reads per gene per sample (I have 12 biological replicates):

ENSG00000000003  0  0  5  7   0  0  0  0   0   0  12   0
ENSG00000000005  0  0  3  2   0  0  0  0   0   2   4   0
ENSG00000000419  2  2  3  5  18 20  0  2   2   3  13  32
ENSG00000000457 15  6 11  7 129 21  8 90  41  97 129 104
ENSG00000000460  6  2  9  5  62 12  3 30  21  61  78  62
ENSG00000000938  0  0  5  0  16  3  0 16   7  25  32   5
...
...

My data are paired-end RNASeq data. Now I want to check if my count data follow NB or Poisson distributions. What is the recommended way to perform this ?

I appreciate you help.

RNA-Seq Negative Binomial Poisson distribution • 1.5k views

ADD COMMENT • link updated 8.5 years ago by Devon Ryan 104k • written 8.5 years ago by debitboro ▴ 270

score 1 · Answer 1 · 2016-05-25

1

Entering edit mode

8.5 years ago

Devon Ryan 104k

Plot the variance as a function of mean (use normalized counts). If there's a linear relationship (there won't be unless you're working on a cell line or something simple like that) then it's Poisson.

ADD COMMENT • link 8.5 years ago by Devon Ryan 104k

0

Entering edit mode

To see if gene counts from technical replicates are well approximated by Poisson, I've tried looking at the SEQC technical replicates, using the Bioconductor seqc package. Poisson was a good fit for most genes. It's a bit tricky because of differences in library size across samples, so I used the expected value for a gene x sample as the rate of the Poisson and looked at the distribution of cdf(count). This was nearly uniform.

ADD REPLY • link 8.5 years ago by Michael Love ★ 2.6k