Question

Vcf Format Genotyping Selection For Multisamples

1

Entering edit mode

12.6 years ago

michealsmith ▴ 800

For vcf file including information for multiple samples like below:

#CHROM POS     ID        REF ALT    QUAL FILTER INFO                              FORMAT      NA00001        NA00002        NA00003
20     14370   rs6054257 G      A       29   PASS   NS=3;DP=14;AF=0.5;DB;H2           GT:GQ:DP:HQ 0/0:48:1:51,51 1/0:48:8:51,51 1/1:43:5:.

In genetics analysis involving familial pedigree, usually we would like to compare the genotyping among different samples (parent vs child). For example, now I wanna select the SNP which appear in all samples, which means the genotyping flag for all the three should be 0/1 or 1/1.

I know it can be done by some bash command (and this is what I'm doing right now); I'm just curious if VCFTOOLS may have any build-in function for such comparison.

thx

vcftools vcf • 4.8k views

ADD COMMENT • link updated 12.6 years ago by thamathpanda ▴ 40 • written 12.6 years ago by michealsmith ▴ 800

0

Entering edit mode

I don't think VCFtools (http://vcftools.sourceforge.net/docs.html) has the functionality we need for this (if it does, I can't find it...), which is why most people write their own Perl or Python scripts to filter their data for pedigree analysis at this stage. You could do it in bash as well, I guess, and search for each genotype flag as a regex.

ADD REPLY • link 12.6 years ago by Alex Paciorkowski 3.5k

0

Entering edit mode

What is the end goal? Perhaps you want to do more advanced analyses? Do you want to phase the data? Are you looking for a loci that might be disease causing?

ADD REPLY • link 12.6 years ago by Zev.Kronenberg 12k

0

Entering edit mode

Yeah, Zev, I need to find disease-causing SNP, ie. to find which SNP segregates with disease according to pedigree.

ADD REPLY • link 12.6 years ago by michealsmith ▴ 800

0

Entering edit mode

If you have the variants called, which it looks like you do, why not let VAAST do the work for you? Our lab developed VAAST and our mailing list is very friendly.

ADD REPLY • link 12.6 years ago by Zev.Kronenberg 12k

1

Entering edit mode

Zev, I agree VAAST looks like an interesting tool. I can see how it may be helpful in identifying pathogenic variants in multigenic disease models, but how does it improve gene finding in autosomal recessive or dominant models where there is one causative gene? If @gerrybio2010 is looking for one gene, pulling out variants shared/not shared by proband and parents with a script will do the trick. I haven't used VAAST, but am certainly willing to give it a try.

ADD REPLY • link 12.6 years ago by Alex Paciorkowski 3.5k

0

Entering edit mode

Yes, Binary filtering can do the trick. But what if you are missing data? The binary filter can remove the causal variant if the parents don't have coverage. VAAST takes a probabilistic approach with knowledge of the trio and frequencies of the alleles in a background file like 1K genomes. Secondly VAAST scores how deleterious a mutation is by using blossom tables and OMIM data.

ADD REPLY • link 12.6 years ago by Zev.Kronenberg 12k

score 0 · Answer 1 · 2012-05-16

Bro.......... RTFM

from VCFtools

About:

Merges VCF files by position, creating multi-sample VCFs from fewer-sample VCFs. The tool requires bgzipped and tabix indexed VCF files on input. (E.g. bgzip file.vcf; tabix -p vcf file.vcf.gz) If you need to concatenate VCFs (e.g. files split by chromosome), look at vcf-concat instead. Usage: vcf-merge [OPTIONS] file1.vcf file2.vcf.gz ... > out.vcf