Dear all,
I am writing a program in order to study the coverage of only one sequence. To sum up the pipeline:
- Detect ORFs in the input sequence
- Align all reads on the sequence (bowtie), reads come from RNA-seq
- Count the number of read in each ORF (5' of reads)
- Normalize these counts
Some input sequences have only 6 to 10 ORFs. I want to normalize these counts and I tried DEseq2, which works fine (functionally speaking).
Now, significantly speaking, do you think that evaluate dispersion and normalize counts with DESeq2 for 6 - 10 genes is something valid ? How the adjust P-value will be impacted as few genes are provided for multiple testing.
I would appreciate any comments or suggestions from experienced people with statistics and RNA-seq data normalization.
Thank you !
----- EDIT ------
As the data does not satisfy the assumption mentioned in the C. Yague answers, what kind of count-based normalisation can be applied ? I was thinking about RPKM, but RPKM is more a unit than a normalisation method. Or should I use something like TPM ? And then compute foldchanges from TPM counts ?
Thank you again for your help !
Dear gilhm,
Can you please tell me what you ended up doing? Were you able to use the small list of genes for analysis, or did you decide ti subset those genes from an analysis of all genes for your organism?
Thanks! Morgan