So for context, I have a set of TPM values (which I converted to log2(TPM+1) for multiple genes for different samples, and I need to calculate the differential expression for RNA-seq values. I've been using this website as a guide: https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2019_March_UCSF_mRNAseq_Workshop/master/differential_expression/DE_Analysis.Rmd. The overall aim of this calculation is so that I can use PANDA to map TM motifs/combine gene expression data to create networks representing interactions between transcription factors and genes, and GSEA to analyze genes ranked by fold change or differential expression p-value.
So what I need to do is match one male to one female of the same age group (which I already did prior; the number of males outnumbered the number of females so I filtered the number of males so that there would be an equal number, and of the same age distribution). But it seems that I need to derive two factors and create a new variable "group" that combines factor1 and factor2. In this case, it would be the gender I would use as the group factors, right?
I would suggest reading the
limma
manual/user guide instead https://www.bioconductor.org/packages/release/bioc/html/limma.htmlSo the manual uses CPM instead of TPM. Do I need to adjust the formulae they use, or is it still the same?
Also, in the manual, it uses a "design" object but I am not sure how to create the object based on the nature of my TPM matrix. Could you explain it a bit more, please?