Background: I am new to genomics and have few programing skills. I depends on VCF and VCFTools for my genomic analysis.
I have my population genomic data in VCF format which is popular in human 1000 genome project. By using VCFTools, I can get allele frequency statistics for each locus and this allele frequency is summarized by all the individuals in the VCF file.
my question:
I want to prepare an input file for some population genetic software, like BayeScan (http://cmpg.unibe.ch/software/bayescan/index.html) to predicting selection along the genome. I need to export an allele frequency statistics for each populations (a group of individuals), and the allele frequency for the same loci (genomic position) should have the same alleles and in the same order.
But, the allele frequency from VCFTools do not satisfy my need. For some ZERO count allele, it outputs nothing. I need the allele frequency was output for all alleles in the locus for all the populations.
I would like to hearing your directions on my problem. All who depends on VCF will benefit from your suggestions.
Thanks in advance.
Example:
This is an example for my data, here euro, ice, norA are three populations. For the position, 1, 3, 32 there are frequency for two allele each in population euro, but only one for the other two populations. I need the Zero count alleles be output as 0 not nothing.
abt6-mao-mbp:BayeScan jianfengmao$ head euro_chr1.frq.count
CHROM POS NALLELES NCHR {ALLELE:COUNT}
1 1 2 9 T:8 A:1
1 3 2 9 A:8 G:1
1 32 2 9 C:7 G:2
1 69 1 9 A:9
1 75 1 9 G:9
1 87 1 9 C:9
1 88 1 9 T:9
1 116 1 9 A:9
1 141 2 9 C:0 T:9
abt6-mao-mbp:BayeScan jianfengmao$ head ice_chr1.frq.count
CHROM POS NALLELES NCHR {ALLELE:COUNT}
1 1 1 3 T:3
1 3 1 3 A:3
1 32 1 3 C:3
1 69 1 3 A:3
1 75 1 3 G:3
1 87 1 3 C:3
1 88 1 3 T:3
1 116 1 3 A:3
1 141 2 3 C:0 T:3
abt6-mao-mbp:BayeScan jianfengmao$ head norA_chr1.frq.count
CHROM POS NALLELES NCHR {ALLELE:COUNT}
1 1 1 6 T:6
1 3 1 6 A:6
1 32 1 7 C:7
1 69 1 6 A:6
1 75 2 6 G:5 A:1
1 87 1 6 C:6
1 88 1 6 T:6
1 116 1 6 A:6
1 141 2 7 C:0 T:7
Hi jianfengmao, this is your 9th questions on Biostar. Could you please validate (or comment) the answers you received for your previous questions ?
Also, several of your questions, including this one, can be easily solved by writing a script. As you have been in this field for at least several months, I would not call you a newcomer. You should really learn a whatever scripting language and solve the problem by yourself. This is your work.
Ih3, Your suggestion is just what I want and what I am doing now. I intend to do that by myself and I used all my time on that. And, I have gotten much till now. But, I have not learn everything I need. I have none training on computer and programming, I am sorry for my so many questions here, really sorry. I will try my best to do that by myself. I will learn programming in perl and python, like I started to learn R four years ago in an isolated environment in China. Thanks for what I have gotten from you all.
Pierre Lindenbaum, I will do what you have mentioned. Thanks for your kindness.