Entering edit mode
9.1 years ago
devenvyas
▴
760
I'm trying to obtain ancestral alleles for almost 600,000 (i.e., Human-Chimp common ancestor) SNPs. A lot of them have dbSNP rs #s, but a lot of them don't.
I am particularly lost in how to obtain these data and am wondering if someone can point me in the right direction.
But how would I extract that data from the 1000 Genomes project?
That thread confused me a bit, so they are saying the ancestral alleles from 1000genomes are representative of the human-chimp common ancestor from Ensembl?
Also, I stumbled across ftp://ftp.ensembl.org/pub/release-82/fasta/ancestral_alleles/homo_sapiens.tar.gz, but I was unclear whether this is equivalent to the Human-Chimp ancestor. I was wondering if someone could help on this matter. Thanks!
historically, primates' alleles were used as ancestral alleles for evolutionary reasons. dbSNP used the chimp's, and Ensembl used the macaque's when I first started working with these huge resources, a few years ago. right now, 1000g uses a combination of primates which allows them to provide a deeper (and, in my opinion, a more appropriate) ancestral description:
http://www.1000genomes.org/faq/where-does-ancestral-allele-information-your-variants-come
by the way, the link you provide is a fasta file. if you would like to get a particular base from there I would suggest using
samtools faidx
, although querying 1000g variants directly may be all you may need.