How to dupicate a sample entry in VCF files or PLINK
0
0
Entering edit mode
2.7 years ago
Mari • 0

Hi,

I have samples in my GWAS study that were infected on two separate occasions, however they only needed to be genotyped once.

For the association analysis to work I would like to duplicate the repeat samples (so I have two entries of their snp information), and when duplicating the entry change, modify the sample ID so I can differentiate the samples in the downstream assoc analysis.

E.g SAMPLE_1 SAMPLE_1_dup

Can anyone think of a way to do this either in PLINK binary format? Modifying the fam files is ok, but I'm not sure how to modify the associated bed files at the same time. Alternatively I also have vcf files.

Thank you for any help in advance,

Mari

vcf GWAS PLINK duplicates • 1.3k views
ADD COMMENT
0
Entering edit mode

if they were only genotyped once, then their genotype should not change. As such, it does not make sense to perform a genetic association on this sample, right? It sounds as if you want to do some time series analysis instead.

ADD REPLY
0
Entering edit mode

So it was the same person genotyped yes, but they were infected on a separate occasion, so we have x2 cases for the same person.

Won't do a time course as previous infection does not offer much protection against reinfection in our model (although will include that as a covariate).

I think I've come up with a work around, whereby I: 1) Duplicated the gwas files 2) Filtered the second duplicate batch of plink files to include those who were re-challenged 3) Renamed the IID's of the re-challenge samples 4) Concatenated re-challenge samples using bmerge with the original data 5) Repeated above for the pheno/covar files

ADD REPLY
0
Entering edit mode

Again, if your genotype isn't changed across different condition, then it stands to say that the genotype are independent to the phenotype change. You might want to rethink the question as to, are you looking for whether the genotypes are associate with probability of re-infection? in this case, code reinfection as the phenotype (e.g. reinfect =1 , did not reinfect = 0). This is more of a design question rather than a software challenge.

ADD REPLY
0
Entering edit mode

Thanks Sam - sorry the study design is quite confusing, as we have participants who were re-infected with the same challenge agent, as well as participants who were re-infected with a different serovar of Salmonella. I've assessed the data since, and I will be analysing the different infections separately. If the participant had a previous infection I am including that as a co-variate.

But you're correct in that I shouldn't associate the same person with two outcome's within the same association study.

ADD REPLY
0
Entering edit mode

@ Mari why did you delete this post?

ADD REPLY
0
Entering edit mode

As it became more of a study design q, rather than a technical one (as mentioned by Sam) which I thought was less suited to the forum. Happy to keep it up if that's best though.

ADD REPLY
1
Entering edit mode

Yes, the exchange between you and Sam is still of value. To give the post closure, you may want to write a short answer detailing your new avenue of exploration and accept that. Future users could read your post and go "hey maybe I should go back to my study design too"

ADD REPLY

Login before adding your answer.

Traffic: 2521 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6