Mutation hunting in 100 exome samples
0
0
Entering edit mode
3.6 years ago
soniabedi.07 ▴ 30

Hi,

I have two main questions related to fishing out information in 100 vcf files

1) If I want to look for particular gene in 100 vcf files, or or 2) If I want to look for particular mutations in 100 vcf files,

How can achieve the above tasks ( not necessarily together) quickly?? Do I use R? If so, which package and how to go about it?

Please help.

Thanks in advance

vcf R mutations • 1.2k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

Thanks @GenoMax.

So how do I look for common mutations/genes within 100 vcf files?? Do I combine all vcf into 1 or hunt one by one??

ADD REPLY
0
Entering edit mode

I am hoping that your VCF files are annotated and contains all required information you are seeking. The best way is to combine all the VCFs into a multisample VCF file and perform your filtering. The links provided above will be helpful. Specifically, you can look into BCFtools merge.

When you have a multisample VCF file, variation in a sample is usually defined by its genotype (1/1 or 0/1). if it is 0/0, then there is no variant in that sample. You should also keep in mind to convert genotypes to missing (./.) or 0/0 if the genotype quality is below 20 or 30. This is a basic quality control procedure that needs to be followed. I am assuming that you are working with a diploid organism.

Then, you can keep only those loci that are related to your genes of interest. It is possible that you know your gene coordinates. You can use bedtools intersect (https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html) in order to get only those loci that are present in or around your gene. Or, check the links above. Hope this helps.

ADD REPLY
0
Entering edit mode

Hi soniabedi.07

I will recommend bcftools isec based or previous experiences. Consider reading though this post

ADD REPLY

Login before adding your answer.

Traffic: 2511 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6