I want to get the alignment file (Amino acid for protein, and nucleotide for CDS) for a particular gene from 100-way alignment of 100 species (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz100way). Is there an easy way to get it?
Thanks
I want to get the alignment file (Amino acid for protein, and nucleotide for CDS) for a particular gene from 100-way alignment of 100 species (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz100way). Is there an easy way to get it?
Thanks
You can find those alignments here. You will need to download the entire set and then find the gene/exons you need in that file.
It may also be worth looking at NCBI's homologene project where you would be able to search for one specific gene.
Thanks. I got the files (like refGene.exonAA.fa.gz) from the link you provided. But it seems, it only includes alignment on exon level. We need to merge all exons by myself. Am I right? Not sure whether there is a direct way to do this, or any tool can help to merge alignment of exons into alignment of a gene.
I also looked at NCBI homologenes, but only found alignment for 10 species.
Thank you again for your suggestions.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
If your focus is a single gene (largely CDS) then it would be good to take a look at ensemble compara For example (https://www.ensembl.org/Homo_sapiens/Gene/Compara_Ortholog?db=core;g=ENSG00000139618;r=13:32315086-32400266) . You can download alignment of 100 plus species for one to one orthologs ) The UCSC alignments typically are genome alignments.
Thank you for your suggestion. I tried to get the alignment, but it seem it generate the alignment on the gene DNA sequence level (including noncoding part), right? I see many long gaps in the alignment. Is there a way to only include CDS sequence or protein sequence? Thanks