Question

Align human reference genome and ancestral genome

0

Entering edit mode

8.1 years ago

ostrokach ▴ 350

I would like to get a list of amino acid mutations that occured in the course of evolution from early primates to current humans.

I found the homo_sapiens_ancestor_GRCh38_e86.tar.gz file on the Ensembl ftp site (ftp://ftp.ensembl.org/pub/release-86/fasta/ancestral_alleles/), which, as I understand, is the inferred genome of the primate ancesstor. This file contains a fasta sequence for every chromosome.

I can also downloda the fasta sequence of the human reference genome: ftp://ftp.ensembl.org/pub/release-86/fasta/homo_sapiens/dna/.

My question is, what tool should I use to align the reference and ancestral genomes and get a VCF file with all the SNPs? Sorry if this has been answered a million times already. Most of the information I found was concerning the mapping of fastq files to the reference genome.

Once I have a VCF file with a list of SNPs, I can run TransVar or SnpEff to convert SNPs to amino acid changes.

Thanks!

genome alignment • 2.9k views

ADD COMMENT • link updated 8.1 years ago by Jean-Karim Heriche 27k • written 8.1 years ago by ostrokach ▴ 350

score 2 · Accepted Answer · 2016-10-07

2

Entering edit mode

8.1 years ago

Jean-Karim Heriche 27k

I am not sure you're going about this in the right way. If you're interested in amino-acid changes in proteins, you can get this information from a phylogenetic tree. You can find gene trees in Ensembl.

ADD COMMENT • link 8.1 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thanks very much for your input! I was following the methods for CADD, but I guess they had to do the alignments at the genome level in order to be able to analyse non-coding variants. I looked at your ensembl link, but could not find any pairwise alignments that I could download. A quick google search led me to treefam, which allows you to download protein-protein mapping between two species (http://www.treefam.org/download#tabview=tab1). I guess I can use that mapping and perform pairwise amino acid alignments myself?

ADD REPLY • link 8.1 years ago by ostrokach ▴ 350

1

Entering edit mode

You don't need pairwise alignments but the multiple sequence alignments used to build the trees. EnsEMBL adapted the TreeFam pipeline for their compara database. You should be able to get alignments, HMMs and trees from both resources. For EnsEMBL, it may be easier to use the perl API.

ADD REPLY • link 8.1 years ago by Jean-Karim Heriche 27k