Hi,
We have identified SNPs in biological replicates (x3) for each population (in vcf files) from RNA-Seq data. My question is: how do we calculate the SNP frequency for each population from pooling the individual biological replicates?
I have found a VCF format in 1000 genomes project for describing all the population in one one VCF file with their respective pooled allele frequencies.
1 15211 rs78601809 T G 100 PASS AC=3050;AF=0.609026;AN=5008;NS=2504;DP=32245;EAS_AF=0.504;AMR_AF=0.6772;AFR_AF=0.5371;EUR_AF=0.7316;SAS_AF=0.6401;AA=t|||;VT=SNP
Our format is:
Sample 1 rep1
Sample 1 rep2
Sample 1 rep3
Sample 2 rep1
Sample 2 rep2
Sample 2 rep3
We have individual VCF file for each replicate from each sample. I just don't have any idea how to put all the reps information in single VCF file and get a pooled AF for each sample like in the format above.
Many thanks.
You can use vcf-tools, vcf-merge, to create a single vcf file using all individual vcf files as inputs. How Can I Merge A Large Amount Of Vcf Files?
Hi Stephen, Thanks for your reply. Simple merging wont solve my problem. Will I get per sample (pooled from replicates) Allele frequencies as highlighted in above example?
You'll get new columns with the info for each sample in separate columns, placed at the end of each line for each site.
Thanks Stephen.