arrange samples in same order in rows and column in R
1
0
Entering edit mode
2.6 years ago
Bioinfonext ▴ 470

Hi,

Could you please also help me how to arrange samples in the same order in rows (targests2) and columns in (betaval2) in R?

 head(betaval2, 10) [,1:5]
                    DC541      DC485       DC490     DC131
1  cg26928153 0.869022470 0.87365220 0.911936463 0.6590429
2  cg16269199 0.625793206 0.78426522 0.805430832 0.5539671
3  cg13869341 0.862986903 0.88750760 0.789204281 0.7218103
4  cg24669183 0.751976011 0.86801138 0.889076957 0.6721988
5  cg26679879 0.379529937 0.34471486 0.383734472 0.4459295
6  cg22519184 0.394488319 0.38111739 0.381072524 0.4119580
7  cg15560884 0.613115675 0.68987661 0.713836687 0.5976042
8  cg01014490 0.009082822 0.01586443 0.006567706 0.1453065
9  cg10692041 0.908627024 0.92327421 0.902507690 0.8106531
10 cg02339369 0.915374608 0.87147961 0.902443744 0.7101839

> head(targets2)
      Sentrix_ID Sentrix_Position    Batch  Category Gender     eGFR
DC541   2.03e+11           R06C01 Batch 15       Control   Male 141.3943
DC485   2.03e+11           R04C01  Batch 8       Control   Male 133.6376
DC490   2.03e+11           R08C01  Batch 8       Control   Male 133.1413
DC131   2.03e+11           R08C01  Batch 7       Control Female 131.9288
DC574   2.03e+11           R03C01 Batch 16       Control Female 130.6548
DC411   2.03e+11           R02C01 Batch 18       Control   Male 127.5505

> fit<-lmFit(betaval2,var)
Error in lmFit(betaval2, var) :
  row dimension of design doesn't match column dimension of data object
R statistics • 2.1k views
ADD COMMENT
0
Entering edit mode

What is contained in the var variable?

First, the probe names (first column of betaval2) should be set as rownames, and then you need to remove that column from the data:

rownames(betaval2) <- betaval2[,1]
betaval2 <- betaval2[,-1]

You then also probably need to try:

fit <- lmFit(t(betaval2), var)

Please ensure --double- and treble-check-- that your metadata and expression data are aligned.

ADD REPLY
0
Entering edit mode

Thank you so much for quick response, but here is the more complete code and I already put CpG as row name and in sampleInfo file (targets2) samples names as rows but still I am getting this error? Is there any way in R to make sure rows in targets2 and columns in mval are in same order?

> library(limma)
> list.files()
[1] "betas_1.csv"                      "betaval2"
[3] "Column.list.txt"                  "regression.new.count.txt"
[5] "sample.annotation.regression.csv" "Sample.Info.regression.txt"
> targets<-read.table("Sample.Info.regression.txt", header=TRUE, stringsAsFactor=FALSE)
> targets$logeGFR=log(targets$eGFR)
> head(targets)
     V1 Sentrix_ID Sentrix_Position    Batch  Category Gender
1 DC541   2.03e+11           R06C01 Batch 15       Control   Male
2 DC485   2.03e+11           R04C01  Batch 8       Control   Male
3 DC490   2.03e+11           R08C01  Batch 8      Control   Male
4 DC131   2.03e+11           R08C01  Batch 7       Control Female
5 DC574   2.03e+11           R03C01 Batch 16      Control Female
6 DC411   2.03e+11           R02C01 Batch 18       Control   Male
      eGFR Age   RRT       CD8T       CD4T         NK       Bcell       Mono
1 141.3943  29 FALSE 0.09559439 0.22490564 0.00000000 0.087392632 0.03744009
2 133.6376  42 FALSE 0.04212238 0.12661890 0.04028556 0.024276752 0.09722832
3 133.1413  39 FALSE 0.03568210 0.15196063 0.03766905 0.031379030 0.07066799
4 131.9288  58 FALSE 0.10451144 0.04749711 0.03571969 0.004003534 0.06539260
5 130.6548  24 FALSE 0.07358490 0.17676019 0.07706284 0.023125340 0.08663073
6 127.5505  31 FALSE 0.01487569 0.11073860 0.01740834 0.065593039 0.05207325
       Gran  logeGFR
1 0.4991278 4.951553
2 0.6401493 4.895132
3 0.6507270 4.891411
4 0.7140364 4.882262
5 0.4966720 4.872559
6 0.7113963 4.848512
> names(targets)[1] <- ""
> targets2<-targets[,-1]
> rownames(targets2)<-targets[,1]
> head(targets2)
      Sentrix_ID Sentrix_Position    Batch Recruitment Category Gender     eGFR
DC541   2.03e+11           R06C01 Batch 15     Belfast  Control   Male 141.3943
DC485   2.03e+11           R04C01  Batch 8     Belfast  Control   Male 133.6376
DC490   2.03e+11           R08C01  Batch 8     Belfast  Control   Male 133.1413
DC131   2.03e+11           R08C01  Batch 7     Belfast  Control Female 131.9288
DC574   2.03e+11           R03C01 Batch 16     Belfast  Control Female 130.6548
DC411   2.03e+11           R02C01 Batch 18     Belfast  Control   Male 127.5505
      Age   RRT       CD8T       CD4T         NK       Bcell       Mono
DC541  29 FALSE 0.09559439 0.22490564 0.00000000 0.087392632 0.03744009
DC485  42 FALSE 0.04212238 0.12661890 0.04028556 0.024276752 0.09722832
DC490  39 FALSE 0.03568210 0.15196063 0.03766905 0.031379030 0.07066799
DC131  58 FALSE 0.10451144 0.04749711 0.03571969 0.004003534 0.06539260
DC574  24 FALSE 0.07358490 0.17676019 0.07706284 0.023125340 0.08663073
DC411  31 FALSE 0.01487569 0.11073860 0.01740834 0.065593039 0.05207325
           Gran  logeGFR
DC541 0.4991278 4.951553
DC485 0.6401493 4.895132
DC490 0.6507270 4.891411
DC131 0.7140364 4.882262
DC574 0.4966720 4.872559
DC411 0.7113963 4.848512
> betaval<-read.table("regression.new.count.txt", header=TRUE, stringsAsFactors=FALSE)
> head(betaval, 10) [,1:5]
           ID       DC541      DC485       DC490     DC131
1  cg26928153 0.869022470 0.87365220 0.911936463 0.6590429
2  cg16269199 0.625793206 0.78426522 0.805430832 0.5539671
3  cg13869341 0.862986903 0.88750760 0.789204281 0.7218103
4  cg24669183 0.751976011 0.86801138 0.889076957 0.6721988
5  cg26679879 0.379529937 0.34471486 0.383734472 0.4459295
6  cg22519184 0.394488319 0.38111739 0.381072524 0.4119580
7  cg15560884 0.613115675 0.68987661 0.713836687 0.5976042
8  cg01014490 0.009082822 0.01586443 0.006567706 0.1453065
9  cg10692041 0.908627024 0.92327421 0.902507690 0.8106531
10 cg02339369 0.915374608 0.87147961 0.902443744 0.7101839
> names(betaval)[1] <- ""
> betaval2<-betaval[,-1]
> rownames(betaval2)<-betaval[,1]
> mval<-log2(betaval2/(1-betaval2))
> head(mval, 10) [,1:5]
                DC541      DC485      DC490      DC131      DC574
cg26928153  2.7300741  2.7896585  3.3723166  0.9507823  2.9886625
cg16269199  0.7418502  1.8620828  2.0494776  0.3126503  1.2317167
cg13869341  2.6550249  2.9799319  1.9045532  1.3755509  2.3476551
cg24669183  1.6002070  2.7173004  3.0027492  1.0360671  2.4162655
cg26679879 -0.7091479 -0.9267193 -0.6834437 -0.3132538 -1.4234119
cg22519184 -0.6181722 -0.6994303 -0.6997048 -0.5134222 -1.1743039
cg15560884  0.6642570  1.1534960  1.3187553  0.5705751  0.8379092
cg01014490 -6.7694800 -5.9549894 -7.2408883 -2.5563076 -5.2000328
cg10692041  3.3138488  3.5889757  3.2105789  2.0980532  3.0335786
cg02339369  3.4351998  2.7614697  3.2095307  1.2930551  3.8525829
> var<-model.matrix(~logeGFR + as.factor(Gender) + Age +CD8T +CD4T +NK + Bcell +Mono ,data=targets)
> fit<-lmFit(mval,var)
Error in lmFit(mval, var) :
row dimension of design doesn't match column dimension of data object
ADD REPLY
0
Entering edit mode

You probably just need:

fit <- lmFit(t(mval), var)

In your model, you are also attempting to adjust for a lot of covariates- is that necessary? The model will 'overfit' with that many parameters (covariates), unless you have a very large sample size.

ADD REPLY
2
Entering edit mode

I'm not convinced the M-values matrix should be transposed, the way the code is presented is correct but there may be samples that are missing in the data or the metadata. this could be easily checked with dim(var), dim(targets2). I agree that the number of covariables could be high, because often proportions of CD8T, CD4T, NK, Bcell and Mono are often highly correlated. A singular value decomposition might help to choose the best covariates to adjust for.

ADD REPLY
0
Entering edit mode

Thanks!

It seems like here is some problem. I will have a look and get back to you.

> dim(mval)
[1] 768078    136
> dim(targets2)
[1] 137  16

Many thanks, Yogesh

ADD REPLY
0
Entering edit mode

This also shows error, Is there any way in R to arrange samples in same rows/column?

fit <- lmFit(t(mval), var)

Error in lmFit(t(mval), var) :
row dimension of design doesn't match column dimension of data object

ADD REPLY
1
Entering edit mode
2.6 years ago
Basti ★ 2.0k

The issue comes from the presence of your cg....... column so you might put it into rownames to keep the information and then remove the column :

 rownames(betaval2)=betaval2[,1]
 betaval2=betaval2[,-1]
ADD COMMENT
0
Entering edit mode

Thank Basti, I already did this, could you please above posted full code?

Many thanks,

ADD REPLY
0
Entering edit mode

Could you provide the output of dim(betaval2), dim(mval)and dim(var)? Thanks

ADD REPLY
0
Entering edit mode

Thank you very much Basti, after correcting same number of samples in phenotype and methylation file, this code works very well.

#model matrix

var<-model.matrix(~logeGFR + as.factor(Gender) + Age +CD8T +CD4T +NK + Bcell +Mono ,data=targets2)

fit<-lmFit(mval,var)


fit2<-eBayes(fit,trend=TRUE, robust=TRUE)

probe<-topTable(fit2,adjust="BH",coef=2,num=Inf)

sig.probe<-probe[which(probe$adj.P.Val<=0.05),]

write.table(sig.probe,file="Result.eGFR.associated.probe.txt",sep="\t",quote=TRUE)

Just few more queries;

Could you please let me know how I can confirm that coef =2 is logeGFR and what else information I can get from this model?

Many thanks,

ADD REPLY
0
Entering edit mode

Normally yes, you can check that your design matrix second column refer to your logeGFR variable. For the second question, I do not know what would be interesting, it depends on your biological questions.

ADD REPLY
0
Entering edit mode

Thanks Basti, could you please also help me how I can understand this regression analysis data (significant eGFR ASSOCIATED CpG); what is t and B?

                    logFC     AveExpr           t     P.Value   adj.P.Val   B
cg03546163  0.410187762 -0.801321796    7.851041916 1.13E-12    8.65E-07    12.30512906
cg17944885  -0.48848589 2.319742786 -5.731964548    6.33E-08    0.024313494 5.152371088
ADD REPLY
1
Entering edit mode

Specific questions on functions output should be adressed by reading function documentations : https://www.rdocumentation.org/packages/limma/versions/3.28.14/topics/toptable

ADD REPLY
0
Entering edit mode

Dear Basti, thanks I wanted to understand that in above regression result sheet log fold change positive for cg03546163 so can I say cg03546163 methylation is positively associated to eGFR means if eGFR increase, methylation is also increased at cg03546163 and cg17944885 methylation is negatively associated with eGFR.

Many thanks,

ADD REPLY

Login before adding your answer.

Traffic: 1929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6