Hello,
I'm trying to switch to using the Phase 3 1000 genomes data from Phase I. In phase I, there was a indicator that said the variant type, so you could for example filter out SNPs easily with a grep command. However, they remove the below from Phase 3.
- VT=SNP, indicates the variant is a snp.
- VT=INDEL, indicates the variant is an indel,
- VT=SV, indicates the variant is a deletion.
Anyone know if there's an easy way to filter out the SNPs? Is there another indicator in the file that I'm missing?
Thanks!
help us, where is the VCF please?
Here's phase I
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521
and here's phase 3
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/
Thanks
Look for
TYPE
tag.I don't think there's a tag like that in these files.
Strangely, there isn't such "TYPE" tag on latest 1000genomes phase3 data (well, it is on the X chromosome).
If you are still willing to build a grep-like query:
I would go for perl though: