Can someone explain to me how to run DE using TPM instead of CPM, please? All the DE guides I'm seeing only use CPM values, but I need to work with TPM values. Providing either a guide of code to use or a manual/tutorial providing one would be most appreciated.
Also, note that I'm working with a .gz file that I downloaded onto RStudio with data.table::fread, which seems like it would affect how I would need to code the DE process in terms of what kinds of objects I'm using.
Okay, so even if the tutorial gives a code like
It should still be fine to just put in my tpm file where CPM should go? (Note here that they use a cpm function, but there isn't a tpm function, to my knowledge.)
And as for the second part of what I asked, here is what I have so far:
"matchedgeneTPM" is the table of TPM's I have (I already converted to log2(TPM+1) beforehand) and "females" and "males" are the list of female and male sample ID's I also constructed beforehand. Everything works fine until the last line, which gives the error "Error in getEAWP(object) : data object isn't of a recognized data class". Would you be able to explain how to fix this error, please?
The
cpm()
function takes in a matrix of counts. So doescalcNormFactors
.Just take your TPM, log it, and pass it through limma.
That's what I'm trying to do but again, I'm getting an error on the last line and I'm asking how to fix it.
you're mixing two packages, edgeR ("dgelist") and limma ("lmFit"). Just put your log(1+TPM) matrix as the first argument to lmFit.
Thanks, but it's giving me the error that expression object should be numeric and that there are 2 non-numeric columns (because the first two columns are the labels of the genes and gene ID's I'm working with). Here is my code:
How should I reformat the code? And apologies, I know this should be simple but I'm new to this kind of stuff.
Also, do I need the stuff with group and mm?