Entering edit mode
3.5 years ago
soniabedi.07
▴
30
Hi,
I have two main questions related to fishing out information in 100 vcf files
1) If I want to look for particular gene in 100 vcf files, or or 2) If I want to look for particular mutations in 100 vcf files,
How can achieve the above tasks ( not necessarily together) quickly?? Do I use R? If so, which package and how to go about it?
Please help.
Thanks in advance
bcftools
should be your go to choice. Take a look at this page for querying and this one for filtering.Also: How to check the presence of mutations of a specific gene in a specific VCF file?
Thanks @GenoMax.
So how do I look for common mutations/genes within 100 vcf files?? Do I combine all vcf into 1 or hunt one by one??
I am hoping that your VCF files are annotated and contains all required information you are seeking. The best way is to combine all the VCFs into a multisample VCF file and perform your filtering. The links provided above will be helpful. Specifically, you can look into BCFtools merge.
When you have a multisample VCF file, variation in a sample is usually defined by its genotype (1/1 or 0/1). if it is 0/0, then there is no variant in that sample. You should also keep in mind to convert genotypes to missing (./.) or 0/0 if the genotype quality is below 20 or 30. This is a basic quality control procedure that needs to be followed. I am assuming that you are working with a diploid organism.
Then, you can keep only those loci that are related to your genes of interest. It is possible that you know your gene coordinates. You can use bedtools intersect (https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html) in order to get only those loci that are present in or around your gene. Or, check the links above. Hope this helps.
Hi soniabedi.07
I will recommend
bcftools isec
based or previous experiences. Consider reading though this post