Entering edit mode
7.9 years ago
morovatunc
▴
560
Hi,
I have more than ~2500 patients and all of them have mutation calling files. (VCF). I would like to subset my patients based on the occurrence of a specific location.
I have thought about giving the location of the mutation as a bed file and filter patients based on that(with snpsift or vcftools). But is there a known method to do this task and get TRUE/FALSE output ?
Best regards,
Tunc.
You mean you have 2500 separate vcf files? Why not merge those together to one vcf?
Thank you for your reply. I keep the data in a single data frame such as TRUE or FALSE so it would be better get information for every individual.
in ex:
I don't get it. Why not create one multisample vcf?
How can I obtain the information of which patient has that specific mutation if I merge all of my vcfs in to the same one unionvcf ?
I only would like to get information such as if a single mutation is existed.
You will get a vcf in which every patient will have his own info field, containing genotype, quality and depth information.
I couldnt understand the part where we test the occurrence of the mutation. Could you prefer me a tool for that ?
Best,
Tunc.
You can probably get this done using vcfmerge from vcftools or gatk combinevariants