Prepare Allele Frequency input file for Sweepfinder2
1
0
Entering edit mode
3.8 years ago
524730309 • 0

Hi, there. I have some problems about converting vcf to the inputfile of Sweepfinder2. According to the Manual of Sweepfinder2, the forth columns of the input file is the indicator as to whether the site has been polarized (i.e., whether it is known that the allele is derived or ancestral). Since I don't see AA (ancestral allele) in the info-filed of my vcf, there is no way that I can find out which one is ancestral or derived allele?

SNP VCF Sweepfinder2 • 1.8k views
ADD COMMENT
0
Entering edit mode
3.1 years ago
Kristian • 0

Hi, if VCF contains one sample which represents the ancestral state/outgroup, than you could use a python script that I have written:

https://github.com/kullrich/bio-scripts/tree/master/vcf/polarizeVCFbyOutgroup.py

It will polarize your VCF file and add the AA flag to the VCF file,

#-vcf vcf input
#-out output
#-ind sample position in VCF which defines the outgroup
#-add adds AA flag to vcf
#-keep specifies if undefined ancestral state (heterozygous ancestral) should be kept in output
python polarizeVCFbyOutgroup.py -vcf VCF.gz -out VCF.polarized.vcf -ind 1 -add

which afterwards could be used as indicated here:

How to convert vcf file to frequency file for sweepfinder2

bgzip VCF.polarized.vcf
tabix VCF.polarized.vcf.gz
vcftools --counts2 --derived --gzvcf VCF.polarized.vcf.gz --stdout | awk 'NR<=1 {next} {print $2"\t"$6"\t"$4"\t0"}' > SF2.input
ADD COMMENT

Login before adding your answer.

Traffic: 1291 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6