Calculating Tajima's D using R and VCF data from 1000 genomes
1
2
Entering edit mode
8.1 years ago
brwomackb1 ▴ 20

Hi everyone, so basically I'm an undergrad student who is researching possible selection on the gene ADAM33 which is associated with asthma through various SNPs. Using Tajima's D I can speculate if the increased amount of asthma cases are a product of our post-industrial environment (this would show neutral selection due to the recent development of this environment) or if they have always been affecting people (which should show some form of negative selection on SNP alleles). Basically I know how to get the VCF data from the 1000 genomes project (here is my URL they gave me for my area of interest

http://browser.1000genomes.org/tmp/slicer/20.3648612-3662893.ALL.chr20.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz

and input it into Rstudio by "importing dataset from URL". From there I am unsure of how to go about calculating Tajima's D with R using this VCF data. My knowledge of R is extremely limited to many of the people on here but I was wondering if anyone had done anything similar or had any advice to give? I'm assuming because VCF data is really just SNP data that using it to calculate Tajima's D is possible. Hope to hear from you all, thank you!

R VCF 1000 Genomes Tajima's D • 4.3k views
ADD COMMENT
1
Entering edit mode
8.1 years ago
anp375 ▴ 190

I want to know how to do this too, so here's a start that may be useful, though I don't know what this vcf looks like and don't know how to calculate Tajima's D:

ACAFAN <- mutate(vcf, ACAFAN = str_extract(INFO, "[^;]+;[^;]+;[^;]+"))
ACAFAN <- separate(ACAFAN, ACAFAN, c("AC", "AF", "AN"), sep=";")
ADD COMMENT

Login before adding your answer.

Traffic: 1628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6