How do we distinguish between SNP/INDEL/SV in 1000 genomes Phase 3 data
1
1
Entering edit mode
10.1 years ago
jxiang15 ▴ 30

Hello,

I'm trying to switch to using the Phase 3 1000 genomes data from Phase I. In phase I, there was a indicator that said the variant type, so you could for example filter out SNPs easily with a grep command. However, they remove the below from Phase 3.

  • VT=SNP, indicates the variant is a snp.
  • VT=INDEL, indicates the variant is an indel,
  • VT=SV, indicates the variant is a deletion.

Anyone know if there's an easy way to filter out the SNPs? Is there another indicator in the file that I'm missing?

Thanks!

1000genomes VCF • 4.9k views
ADD COMMENT
0
Entering edit mode

help us, where is the VCF please?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Look for TYPE tag.

##INFO=<ID=TYPE,Number=A,Type=String,Description="Type of variant">
ADD REPLY
0
Entering edit mode

I don't think there's a tag like that in these files.

ADD REPLY
0
Entering edit mode

Strangely, there isn't such "TYPE" tag on latest 1000genomes phase3 data (well, it is on the X chromosome).

If you are still willing to build a grep-like query:

for file in ALL.chr*.vcf.gz; do zcat $file | grep -P "\t[ACGT]\t[ACGT]\t" > ${file/.vcf.gz/.snps.vcf.gz}; done<

I would go for perl though:

for file in ALL.chr*.vcf.gz; do zcat $file | perl -lane 'print if /\t[ACGT]\t[ACGT]\t/' > ${file/.vcf.gz/.snps.vcf.gz}; done
ADD REPLY
4
Entering edit mode
10.1 years ago

bcftools allows you to filter variants by type using option -v, --types snps|indels|mnps|other (comma-separated list of variant types to select), plus it generates perfectly well-formed vcf output files. for this last reason, and for its great performance (latest HTSlib 1.1 core works like a charm), I would definitely recommend it instead of grep for parsing vcf files. as easy as this simple command:

bcftools view -v snps all.variants.vcf > snps.only.vcf
ADD COMMENT
0
Entering edit mode

Thanks, I'm try that. However, bcftools is probably looking for a tag just like I am. It would be great to know what it's looking for when doing the filtering. Also, is there a particular reason you like bcftools instead of vcftools, just curious.

ADD REPLY
0
Entering edit mode

bcftools is faster. it is even stated in the vcftools Perl tools and API page, and roughly described on a small section of the vcftools site.

ADD REPLY

Login before adding your answer.

Traffic: 2831 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6