Common SNPs among all populations
2
2
Entering edit mode
6.0 years ago

Hi all Dear,

I have a VCF file that contains approximately 20 millions SNPs. And my VCF file has 5 populations (KHUZ, MAZ, WAZ, EAZ and GIL), each population has 10 samples. So, I want to know how many SNPs are common between the 5 populations?

Best Regard

SNP VCF • 3.7k views
ADD COMMENT
0
Entering edit mode

Take a look at VCFtools

Usage guide

ADD REPLY
0
Entering edit mode

Many thanks for your reply,

I saw the page you shared. And the only thing I can find is the option --exclude-positions-overlap. Can you guide me more precisely?

ADD REPLY
0
Entering edit mode

In the first link you have different tools, as vcf-isec

Creates intersections and complements of two or more VCF files. Given multiple VCF files, it can output the list of positions which are shared by at least N files, at most N files, exactly N files, etc

See also : Obtain one vcf file of shared SNPs from input files with different samples using vcf-isec (vcftools)

ADD REPLY
1
Entering edit mode
6.0 years ago

Assuming the Samples 1-10 are from population 1, 11-20 from population 2 and 21-30 from population 3, you can filter for variant sites, where at least one sample in each population has a non-reference allel, using bcftools view like this:

$ bcftools view -i 'GT[0-9]="alt" && GT[10-19]="alt" && GT[20-29]="alt"' input.vcf

fin swimmer

ADD COMMENT
0
Entering edit mode

I run your script, but I encountered this error.

[filter.c:1379 filters_init1] Error: the tag "INFO/GT[1" is not defined in the VCF header

this is header my vcf file:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  EAZ01   EAZ02   EAZ03   EAZ04   EAZ06   EAZ08   EAZ20   EAZ21   EAZ25   EAZ26   GIL01   GIL03   GIL04   GIL05   GIL06   GIL08   GIL11 GIL14    GIL15   GIL17   KHU01   KHU02   KHU04   KHU05   KHU06   KHU07   KHU09   KHU12   KHU15   KHU17   MAZ01   MAZ02   MAZ03   MAZ04   MAZ05   MAZ08   MAZ10   MAZ12   MAZ13   MAZ19   WAZ01   WAZ02   WAZ03 WAZ04    WAZ05   WAZ06   WAZ07   WAZ08   WAZ12   WAZ16
ADD REPLY
0
Entering edit mode

Which version are you using? The syntax I described was introduced in v1.8. Current release is v1.9.

ADD REPLY
0
Entering edit mode

I use bcftools/1.3.1 version. I only have this version on the server.

ADD REPLY
2
Entering edit mode

Have a look at bioconda. This enables you to install the newest version (and a lot of other bioinformatic tools) even without root permissions. Also the first part of this tutorial I'v written might be useful for you.

fin swimmer

ADD REPLY
0
Entering edit mode

I installed the suggested version, but when I run the script, it does not give me the output. And it only displays the content of my VCF file in a graphical and fast way. My expectation is that when i run the script, i will give the number of common PSNs between 5 populations.

i code run:

bcftools view -i 'GT[1-10]="alt" && GT[11-20]="alt" && GT[21-30]="alt" && GT[31-40]="alt" && GT[41-50]="alt"' Final.vcf
ADD REPLY
1
Entering edit mode

What you see is the filtered vcf. One could save this output to a new file using > file.vcf at the end of the command.

If you don't want this and you are only interested in the number of variants you can count the lines excluding the header lines.

$ bcftools view -i 'GT[0-9]="alt" && GT[10-19]="alt" && GT[20-29]="alt" && GT[30-39]="alt" && GT[40-50]="alt"' Final.vcf|grep -v "^#"|wc -l

fin swimmer

PS: Be aware of the counting of the sample numbers. It's starting from 0!

ADD REPLY
0
Entering edit mode

This code works well but with a small change.

$ bcftools view -i 'GT[0-9]="alt" && GT[10-19]="alt" && GT[20-29]="alt" && GT[30-39]="alt" && GT[40-49]="alt"' Final.vcf|grep -v "^#"|wc -l
ADD REPLY
0
Entering edit mode
6.0 years ago

using vcffilterjdk http://lindenb.github.io/jvarkit/VcfFilterJdk.html

an example with 3 populations

select.code:

final String POP1[]={"S1"};
final String POP2[]={"S2","S3"};
final String POP3[]={"S4","S5"};

final String ALL_POPS[][]={POP1,POP2,POP3};

for(String pop[]:ALL_POPS) {
    if(Arrays.stream(pop).
        map(SAMPLE->variant.getGenotype(SAMPLE)).
        noneMatch(G->G.isHet() || G.isHomVar())) return false;
    }
return true;

usage:

java -jar dist/vcffilterjdk.jar -f select.code input.vcf
ADD COMMENT

Login before adding your answer.

Traffic: 1803 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6