Here is part of my data table, which I will call "df":
Subject.ID Sex V1 V2 V3 V4 V5
1 GTEX-1117F female 4.24944534 0.18307358 1.99276843 0.21785110 0.74777407
2 GTEX-1128S male 3.286585483 0.165944088 1.843983844 0.444455046 0.307311939
3 GTEX-11EMC female 3.31947343 0.27846061 1.46519089 0.33753984 0.91708799
4 GTEX-11GSP female 3.232353768 0.215492230 1.778208576 0.166458372 0.644225472
5 GTEX-11TTK male 2.934705200 0.256406854 1.712815854 0.256165277 0.948301812
6 GTEX-11ZUS male 3.77188558 0.18434378 2.43189035 0.37373172 1.14339342
7 GTEX-12WSD female 4.22032995 0.21933892 0.93900085 0.08687886 1.06901468
8 GTEX-12ZZX female 3.43616184 0.39473363 2.73205207 0.33525457 1.51399598
9 GTEX-1313W male 3.66334462 0.39418485 2.47248777 0.53545580 0.64634702
10 GTEX-131XW female 2.956614288 0.276317993 1.977096712 0.167743280 1.528071165
How do I correctly format the following code to account for the kind of dataframe I'm working with? I'm using sex as the factors to be interacted. Here is what I have so far:
design <- model.matrix(~ Sex, data = df)
fit <- lmFit(df, design)
fit <- eBayes(fit)
topTable(fit)
The second line gives me the error Expression object should be numeric, instead it is a data.frame with n non-numeric columns
(and I know why I'm getting this error; I'm not sure how to handle the error, though).
Note: I also asked on SE: https://bioinformatics.stackexchange.com/questions/21169/how-do-i-correctly-format-my-limma-ebayes-code-reposting-because-previous-post
Try subsetting
df
so it'sdf[,-c(1,2)]
- that will exclude the non-numeric columnsGives the error
"object 'Sex' not found"
if I do it like that.Doing
lmFit(data.matrix(new_dataset[,-c(1,2)]), design)
gives the error"row dimension of design doesn't match column dimension of data object"
, and doingfit <- lmFit(new_dataset[,-c(1,2)], design[,-c(1,2)])
gives the same"Expression object should be numeric, instead it is a data.frame with n non-numeric columns"
error.John, I am the author of the limma package. The format of your data is a bit mysterious. Can you explain it a bit more? limma operates on gene expression matrices where the rows are genes and the columns are samples, but your data seems to be the other way around.
How many rows and columns does your data.frame
df
have? What do the columns V1, V2 represent? How many V columns are there?It might help if you explained how you created the data.frame
df
in the first place, i.e., what was the original file that you read into R.Hello, I sent you a direct email because I do not want the entirety of my code (or even more than what's necessary) in public forums.
You're shotgunning limma/TPM posts on the exact same issue here and on Bioinfo StackExchange. Please stop that and focus content on a single thread. It's just double and tripple effort for the same underlying issue. Read the manuals, use standarf file formats as it recommends (expression matrix) and follow the best practices.
I made OP aware of this on a discord channel off forum. They're new and on my suggestion, they added links to the cross-posts.
perfect, thanks!