Entering edit mode
7.6 years ago
ddzhangzz
▴
90
I downloaded more than 4000+ VCF files from TCGA but I am only interested in mutations of one gene, IDH1. I am wondering what is the best way to extract the mutations of this gene from these VCF files. Desired output would be a data matrix. Is there a vcf tools can realize this?
If you're working with somatic mutations, you should download MAF file of each cohort rather than vcf for each patient/sample.
I think you can use
bedtools intersect
here where the option-a
would be a bed file for the coordinate of your interest gene and-b
would be the vcf files