I would like to investigate about the relation of RNA-Seq data together, i.e. making gene networks.
Can I use linear regression model after log-transformation of data, i.e. log(read_count+1)?
Actually, I have done it and the results appear meaningful, but I am doubted about the process.
Appreciate for any thought.
Agreed, but note that when you talk about an exponential gene expression, it's really an exponential read-count relationship. We don't know the real relationship between readcount and expression. Some people think log(reads) is proportional to expression already.
Yes I do think so as well (that log(reads) is proportional to expression). This is because the original RNA molecules are amplifified in an exponential fashion (PCR) before they are measured by microarray/RNAseq/Taqman/any other assay that measures PCR-amplified RNA
I think log transformed count indeed maintains the original proportion, no? However for the purpose of the linear model, would log transformed count cause non-linearity?
Would log-transformed value violate the linearity assumption required for linear regression? Are independent variables and dependent variables still linear correlated after transformation?