Question

salmon TPM output to linear regression in R

0

Entering edit mode

5.0 years ago

evelyn ▴ 230

I want to find the candidate genes for 20 different chemical compounds. I am using TPM data for 50 cultivars and have a matrix showing TPM values for each gene for all 50 cultivars where A1---A6 are genes, A86----A60 are cultivars:

gene       A86         A90       A99   A16          A09         A60
A1          0          0.4        0     0          0          0
A2          0          0          0     0          0          0
A3          0.5        0          0     0.42       0          0
A4          0          0          0     0          0          0
A5          0          0          0     0          0          0
A6          0          0          0     0          0          0

I have chemical compound concentration dataset for each compound like:

Cultivar  Compound_X
A86  20.5
A90  5.6
A99  7.1
A16  12
A09  1.5
A60  9.9

I have TPM values for all cultivars but concentration values are missing for some of the cultivars for different chemical compounds. I want to run standard linear regression approach in R to find what are candidate genes for each chemical compound based on their p values.

for (gene in 1:ngenes){
model = lm(Compound_X~TPM[gene,])
}

I want to extract the p-values from the linear regression and save it to a vector for each gene for each chemical compound to find candidate genes. Thank you!

RNA-Seq salmon lm • 1.3k views

ADD COMMENT • link 5.0 years ago by evelyn ▴ 230

0

Entering edit mode

you can find p-values by using summary function in R : s <- summary(lm(volatile~TPM[gene,])) . p-values are stored in the coefficients component e.g. s$coefficients

ADD REPLY • link 5.0 years ago by Nicolas Rosewick 11k

0

Entering edit mode

Thank you! I am actually not able to run the lm yet. I want to run it using a for loop as I mentioned in the question. I have two datasets mentioned and I want to perform the lm step. Your suggestion will be helpful after that.

ADD REPLY • link 5.0 years ago by evelyn ▴ 230

0

Entering edit mode

Is there a way to drop the genes that have zero TPM for all cultivars?

ADD REPLY • link 5.0 years ago by evelyn ▴ 230

0

Entering edit mode

Plase also note that raw TPM values are not normal distributed so you should not use lm directly on them. Log2 transform them first (remember a pseudo count of your choice).

ADD REPLY • link 5.0 years ago by Kristoffer Vitting-Seerup ★ 4.2k