Single VCF file with multiple populations from 1000 genome project
0
0
Entering edit mode
9.2 years ago
JstRoRR ▴ 60

Hi,

We have identified SNPs in biological replicates (x3) for each population (in vcf files) from RNA-Seq data. My question is: how do we calculate the SNP frequency for each population from pooling the individual biological replicates?

I have found a VCF format in 1000 genomes project for describing all the population in one one VCF file with their respective pooled allele frequencies.

1 15211 rs78601809 T G 100 PASS AC=3050;AF=0.609026;AN=5008;NS=2504;DP=32245;EAS_AF=0.504;AMR_AF=0.6772;AFR_AF=0.5371;EUR_AF=0.7316;SAS_AF=0.6401;AA=t|||;VT=SNP

Our format is:

Sample 1  rep1
Sample 1  rep2
Sample 1  rep3
Sample 2  rep1
Sample 2  rep2
Sample 2  rep3

We have individual VCF file for each replicate from each sample. I just don't have any idea how to put all the reps information in single VCF file and get a pooled AF for each sample like in the format above.

Many thanks.

VCF SNP-Calling 1000-Genomes-Project • 2.9k views
ADD COMMENT
0
Entering edit mode

You can use vcf-tools, vcf-merge, to create a single vcf file using all individual vcf files as inputs. How Can I Merge A Large Amount Of Vcf Files?

ADD REPLY
0
Entering edit mode

Hi Stephen, Thanks for your reply. Simple merging wont solve my problem. Will I get per sample (pooled from replicates) Allele frequencies as highlighted in above example?

ADD REPLY
0
Entering edit mode

You'll get new columns with the info for each sample in separate columns, placed at the end of each line for each site.

ADD REPLY
0
Entering edit mode

Thanks Stephen.

ADD REPLY

Login before adding your answer.

Traffic: 1552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6