Hi there,
I'm new to vcf file analysis and would like to download a huge database for human SNPs with information about the location, sequence variation and if it is possible to be homozygous.
So far I found this directory for files of the 1000 genome project where I think I can download the relevant data. However, I'm not sure if I look at the right columns.
The data looks like this:
22 16050654 esv3647175;esv3647176;esv3647177;esv3647178 A <CN0>,<CN2>,<CN3>,<CN4> 100 PASS AC=9,87,599,20;AF=0.00179712,0.0173722,0.119609,0.00399361;AN=5008;CS=DUP_gs;END=16063474;NS=2504;SVTYPE=CNV;DP=22545;EAS_AF=0.001,0.0169,0.2361,0.0099;AMR_AF=0,0.0101,0.219,0.0072;AFR_AF=0.0061,0.0363,0.0053,0;EUR_AF=0,0.007,0.0944,0.003;SAS_AF=0,0.0082,0.1094,0.002;VT=SV GT 3|0 0|0 0|0 0|0 0|0 0|0 0|4 0|0 0|0 0|3 0|0 0|0 0|0 0|0 0|3 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 3|0 0|0 3|0 0|0 0|0 3|0 0|0 0|0 0|0 0|0 3|0 0|0 0|0 0|0 0|3 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|3 0|0 0|4 0|0 0|0 0|0 3|0 0|0 0|0 0|0 0|3 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 3|0 0|0 0|0 0|0 0|0 3|0 0|0 0|0 0|3 3|0 0|3 2|0 0|0 0|0 ...
Other Entries only show 0|0
, 0|1
1|0
, so I initially thought the numbers would indicate the haplotype of the SNP in different individuals. However, I don't understand the difference between 0|2
and 3|0
then.
Edit: I have to add, that there is no documentation of these columns in the vcf file header
Thank you very much! That helps a lot. So when I'm looking for SNPs which can occur homozygous, I would check for at least one entry with
n|n
orn/n
with n > 0 ?Yes.