Identifying unique SNPs between species
2
1
Entering edit mode
5.4 years ago
Wilber0x ▴ 50

I have 8 samples each with a VCF file made from aligning a genome skim of that sample to the same reference. I can see the number of SNPs for each sample in their individual VCF files, but I want to know how many SNPs and which SNPs are unique to each sample.

Can I merge the VCF files into one to do this? If so how?

I used bowtie2 for the alignment.

snp vcf alignment • 2.6k views
ADD COMMENT
1
Entering edit mode
5.4 years ago
guillaume.rbt ★ 1.0k

I've dealed with the same type of analysis.

My solution was to do a pooled genotyping thanks to the GATK "best practices" pipeline, to obtain a single multi-sample VCF with the called genotypes of all my samples.

Then I filtered this VCF to obtain SNPs unique to each sample, for that you can use "SnpSift filter". (http://snpeff.sourceforge.net/SnpSift.html#filter), with the isVariant() and isRef() functions.

ADD COMMENT
0
Entering edit mode
5.4 years ago

Merging is a good idea. You can do it like this:

1. bgzip all vcf files

$ parallel bgzip -c {} > {}.gz ::: *.vcf

2. tabix index these files

$ parallel tabix {} ::: *.vcf.gz

3. create a list of compressed vcf files

$ find -maxdepth 1 -iname "*.vcf.gz" > samples.txt

4. merge files, normalize and split multiallelic variants

$ bcftools merge -l samples.txt -Ou | bcftools norm -f ref.fa -m - -o merged.vcf

5. filter merge vcf file for sites, where only one sample has at least one ALT allele and create a new vcf file for each sample with its private variants

$ parallel "bcftools view -i 'COUNT(GT=\"alt\") = 1' merged.vcf | bcftools view -x -s {} -o {}.privat.vcf" ::: `bcftools query -l merged.vcf`
ADD COMMENT

Login before adding your answer.

Traffic: 2564 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6