Entering edit mode
10.6 years ago
AW
▴
350
Hi,
I have some RNA-seq samples that I want to normalize and then output RPKM expression, and I will use the following commands from EdgeR.
expr <- DGEList(counts=data, group=conditions)
expr <- calcNormFactors(expr)
expr_norm <- rpkm(expr, log=FALSE,gene.length=vector)
I'd be very grateful if you could answer these questions.
- When creating the
expr <- DGEList(counts=data, group=conditions)
, what effect does specifying groups have one the TMM normalisation? How does TMM use this information and how would the results differ if you did specify groups versus not? - The expression data I am using was obtained from mapping reads onto denovo contigs assembled with Trinity. I then chose the most highly expressed contig from each cluster as the "best isoform" and then summed expression across all the contigs in the cluster as the expression value for that cluster. Therefore I do not have one obvious gene length to use. Should I use the longest contig from the cluster?
Thanks!
Any questions relating to edgeR may be best answered via the mailing list.
http://www.bioconductor.org/help/mailing-list/
If I were you I'd search the mailing list first to see if someone else has asked a similar question. If not make a post. The developer of edgeR and other R bioconductor packages are pretty active on the mailing lists.