UPDATE : I added another relevant question to this post instead of generating a new post. So, I should start my post by saying that I am not sure if the thing I am doing conceptually makes sense ! I am very new to 1KG data and population genetics, and that is the task is given to me !
So, basically, I want to see what is the variant effect of each single variation in each single individual participated in 1KG. so, I got the vcf file of a given gene, called "CHAT" from 1KG project using
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -N -e 'select concat("ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/ALL.",K.chrom,".phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz"),concat(RIGHT(K.chrom,LENGTH(chrom)-3),":",MIN(K.txStart)+1,"-",MAX(K.txEnd)) from knownGene as K,kgXref as X where K.name=X.kgId and X.geneSymbol="CHAT" ' | xargs ./tabix-0.2.5/tabix -fh
and then I used Variant_effect_predictor and ran
perl /home/superiois/Downloads/variant_effect_predictor/variant_effect_predictor.pl -database -i sampleVCF_chat.vcf -sift b -polyphen p.
my questions are here : 1) However, in the output - there is no information for every single variation - meaning some of them has sift/polyphen output - but some others don't. 2) how can I get the the variant effect prediction on each single individual ? Because the VEP output does not have any information regarding the samples nor the frequency of that variations. Basically, I would like to see, how many individuals have a certain pathogenic variants.Thanks for your help
ah - so it means, those that are not pathogenic/benign are not missense ...
Thanks for the reply - I had another side questions very relevant to this post - which made me to update my original post