Hi all,
I am trying to run a survival analysis on my data looking at SNPs which influence speed to develop the disease of interest. I have seen previous studies which have done this using the R plugin for PLINK but I am having difficulty running it.
I'm not sure how to specify the survival time and onset of the disease within the code. I have looked online for examples but the only one I can find is on the PLINK website which doesn't go into specifics. The code I'm using is as follows:
../plink --bfile testsnps --R cox_script_rplugin.txt --out test_snps_surv --noweb
cox_script_rplugin.txt
is as follows:
library(survival)
Rplink <- function(PHENO,GENO,CLUSTER,COVAR)
{
f1 <- function(s)
{
m <- summary( coxph( Surv( PHENO [,1], COVAR[,2] ) ~ s ) )
r <- c( m$coef , m$loglik, m$n )
c( length(r) , r )
}
apply( GENO , 2 , f1 )
}
I have set up my fam file so that the first phenotype column is survival time and the 2nd phenotype column is whether or not the individual got the disease.
The results I'm getting out are:
11 rs4910084 9915879 A NA
11 rs10500715 9929638 C NA
11 rs7952455 9936337 A NA
14 rs7157940 93632946 A NA
17 rs11078308 15411904 A NA
Does anyone have an example of the script they used for this type of analysis? Any help would be greatly appreciated.
Many thanks,
Caragh
Hi
I'm also new here so I could not confirm what is wrong. Our group has only successfully used R plugin to calculate the allele frequency (another example shown in P-link document) at the moment. I'll try cox regression in the following weeks. But if you don't mind, I could discuss with you first.
Could you please run
plink --file mydata --R myscript.R --R-debug
and show me the content ofmylog.R
?Yes, I would be happy to discuss! The content of
mylog.R
after debugging was as follows:Look at your
COVAR <- matrix(NA, nrow = n, ncol = 0, byrow = T)
part. No columns inside right?m <- summary(coxph(Surv(PHENO [,1], COVAR[,2] ) ~ s))
the input is empty.And your
PHENO <- c( 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 )
part. All the individuals you put in are controls? If not, are you coding your case/control as 1/0? P-Link by default is 2/1 if I remember correctly.No the cases and controls are specified in the fam file (there should be 325 in total) but they're not being translated across for some reason.
I guess the reason you get all NAs because the survival time is empty~ Are you willing to fix this part and see what is going on first?
I have specified the survival time in the fam file (look at the original question) but I can't figure out why it is not translating to the script. Do you know how to specify that?
The fam file is generated from ped file right? Then it should contain the first 6 columns of your ped file. Why do you say that the first phenotype column is survival time and the 2nd phenotype column is whether or not the individual got the disease?
Because you can specify the phenotype you want to use. This can be added to the fam file. Should I input it separately? If so how do I do this?
Hi, I'm currently doing something similar as you are doing. I'm just wondering whether all the result you got were like that?
Yes I'm afraid they were all like this. Have you been able to get it to work?