For a problem like this you want to look into the apply
function in R. This function will let you perform a function row-wise or column-wise on a dataframe or matrix.
From the help menu: apply(X, MARGIN, FUN, ...)
where X
is your dataframe/matrix, MARGIN
is either 1
for row-wise or 2
for column-wise, and FUN
is the function that you want to perform. Depending on what you want to do, you can have a base FUN
such as median
or sum
, or you can define your own function(x)
, where x
is each row (or column) in your dataframe.
So for the example of dataframe df
where columns 2-4 are Malignant and 5-7 are fibroblast you can run:
pValues <- apply(df, 1, function(x) t.test(x[2:4],x[5:7])$p.value)
This will take df
and for each row (indicated by the 1
, as opposed to each column) it will perform function(x)
, whereby a t-test is performed on the elements 2-4 compared to 5-7, and the p-value is reported (hence the $p.value
). This will perform that function for each row, and store the p-values in the vector pValues
.
If you are getting this data from raw RNA-Seq data your best bet is to use a well established method like DESeq2 or edgeR or limma.
Even if you do not have the raw data it is a much better solution to use limma via its trend functionality! Those values you post are mostly likely not normal distributed so you should NOT use a t-test!
Hi, I applied this line pValues <- apply(df, 1, function(x) t.test(x[2:4],x[5:7])$p.value)
But I got the following error Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") : missing value where TRUE/FALSE needed
Can someone shed some light on this? Many Thanks Chris
Here is my my data https://drive.google.com/open?id=1LiJD7T6oR5MtABwYqkhUrJFfo7XRxJ_z