Vcf Format Genotyping Selection For Multisamples
1
1
Entering edit mode
12.6 years ago
michealsmith ▴ 800

For vcf file including information for multiple samples like below:

#CHROM POS     ID        REF ALT    QUAL FILTER INFO                              FORMAT      NA00001        NA00002        NA00003
20     14370   rs6054257 G      A       29   PASS   NS=3;DP=14;AF=0.5;DB;H2           GT:GQ:DP:HQ 0/0:48:1:51,51 1/0:48:8:51,51 1/1:43:5:.

In genetics analysis involving familial pedigree, usually we would like to compare the genotyping among different samples (parent vs child). For example, now I wanna select the SNP which appear in all samples, which means the genotyping flag for all the three should be 0/1 or 1/1.

I know it can be done by some bash command (and this is what I'm doing right now); I'm just curious if VCFTOOLS may have any build-in function for such comparison.

thx

vcftools vcf • 4.8k views
ADD COMMENT
0
Entering edit mode

I don't think VCFtools (http://vcftools.sourceforge.net/docs.html) has the functionality we need for this (if it does, I can't find it...), which is why most people write their own Perl or Python scripts to filter their data for pedigree analysis at this stage. You could do it in bash as well, I guess, and search for each genotype flag as a regex.

ADD REPLY
0
Entering edit mode

What is the end goal? Perhaps you want to do more advanced analyses? Do you want to phase the data? Are you looking for a loci that might be disease causing?

ADD REPLY
0
Entering edit mode

Yeah, Zev, I need to find disease-causing SNP, ie. to find which SNP segregates with disease according to pedigree.

ADD REPLY
0
Entering edit mode

If you have the variants called, which it looks like you do, why not let VAAST do the work for you? Our lab developed VAAST and our mailing list is very friendly.

ADD REPLY
1
Entering edit mode

Zev, I agree VAAST looks like an interesting tool. I can see how it may be helpful in identifying pathogenic variants in multigenic disease models, but how does it improve gene finding in autosomal recessive or dominant models where there is one causative gene? If @gerrybio2010 is looking for one gene, pulling out variants shared/not shared by proband and parents with a script will do the trick. I haven't used VAAST, but am certainly willing to give it a try.

ADD REPLY
0
Entering edit mode

Yes, Binary filtering can do the trick. But what if you are missing data? The binary filter can remove the causal variant if the parents don't have coverage. VAAST takes a probabilistic approach with knowledge of the trio and frequencies of the alleles in a background file like 1K genomes. Secondly VAAST scores how deleterious a mutation is by using blossom tables and OMIM data.

ADD REPLY
0
Entering edit mode
12.6 years ago
thamathpanda ▴ 40

Bro.......... RTFM

from VCFtools

About:

Merges VCF files by position, creating multi-sample VCFs from fewer-sample VCFs. The tool requires bgzipped and tabix indexed VCF files on input. (E.g. bgzip file.vcf; tabix -p vcf file.vcf.gz) If you need to concatenate VCFs (e.g. files split by chromosome), look at vcf-concat instead. Usage: vcf-merge [OPTIONS] file1.vcf file2.vcf.gz ... > out.vcf

ADD COMMENT

Login before adding your answer.

Traffic: 1570 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6