Entering edit mode
8.5 years ago
debitboro
▴
270
Hi Biostars,
I have used HTSeq to generate the following table for counting reads per gene per sample (I have 12 biological replicates):
ENSG00000000003 0 0 5 7 0 0 0 0 0 0 12 0
ENSG00000000005 0 0 3 2 0 0 0 0 0 2 4 0
ENSG00000000419 2 2 3 5 18 20 0 2 2 3 13 32
ENSG00000000457 15 6 11 7 129 21 8 90 41 97 129 104
ENSG00000000460 6 2 9 5 62 12 3 30 21 61 78 62
ENSG00000000938 0 0 5 0 16 3 0 16 7 25 32 5
...
...
My data are paired-end RNASeq data. Now I want to check if my count data follow NB or Poisson distributions. What is the recommended way to perform this ?
I appreciate you help.
To see if gene counts from technical replicates are well approximated by Poisson, I've tried looking at the SEQC technical replicates, using the Bioconductor seqc package. Poisson was a good fit for most genes. It's a bit tricky because of differences in library size across samples, so I used the expected value for a gene x sample as the rate of the Poisson and looked at the distribution of cdf(count). This was nearly uniform.