Extract mutations for a specific gene from a vcf file
2
0
Entering edit mode
7.6 years ago
ddzhangzz ▴ 90

I downloaded more than 4000+ VCF files from TCGA but I am only interested in mutations of one gene, IDH1. I am wondering what is the best way to extract the mutations of this gene from these VCF files. Desired output would be a data matrix. Is there a vcf tools can realize this?

VCF • 4.5k views
ADD COMMENT
1
Entering edit mode

If you're working with somatic mutations, you should download MAF file of each cohort rather than vcf for each patient/sample.

ADD REPLY
0
Entering edit mode

I think you can use bedtools intersect here where the option -a would be a bed file for the coordinate of your interest gene and -b would be the vcf files

ADD REPLY
6
Entering edit mode
7.6 years ago

tabix is the best tool to extract regions of interest from vcf files

bgzip my.vcf # tabix works on block compressed data only (output my.vcf.gz)
tabix -p vcf my.vcf.gz # index vcf file 
tabix my.vcf.gz chr1:1-1000000

See other options of tabix http://www.htslib.org/doc/tabix.html

ADD COMMENT
0
Entering edit mode
7.6 years ago

If you know the coordinates of the gene, you could use awk or bedtools to get all mutations in a specific genomic interval.

ADD COMMENT

Login before adding your answer.

Traffic: 2532 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6