Get DNA and Amino Acid Sequence UCSC Genome Browser

Entering edit mode

10.2 years ago

abhikdeora • 0

I'm trying to get both the nucleotide and amino acid sequence for a given region of the mm10 genome, preferably in plain text format.

I know that I can get the DNA sequence for the region by going to http://genome.ucsc.edu/cgi-bin/das/mm10/dna?segment=chr3:93396405,93396500, for example.

Is there a similar link that provides the translated amino acid sequence for the same region in plain text format?

I've tried to use the Table Browser, but it never gives me the exact sequence that I want it to.

Thanks much.

translation genome ucsc amino-acid • 7.4k views

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 10.2 years ago by abhikdeora • 0

Entering edit mode

10.2 years ago

Pierre Lindenbaum 166k

You could translate the DNA with the following XSLT stylesheet:

$ xsltproc translate.xsl "http://genome.ucsc.edu/cgi-bin/das/mm10/dna?segment=chr3:93396405,93396500"

>translate(chr3:93396405-93396500)
ARPESSPGSERQTRPESSPGSERQARPESSPG

	<?xml version='1.0' encoding="UTF-8" ?>
	<xsl:stylesheet
	xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
	version='1.0'
	>
	<xsl:output method="text" />

	<xsl:template match="/">
	<xsl:apply-templates select="DASDNA"/>
	</xsl:template>

	<xsl:template match="DASDNA">
	<xsl:apply-templates select="SEQUENCE"/>
	</xsl:template>

	<xsl:template match="SEQUENCE">
	<xsl:text>>translate(</xsl:text>
	<xsl:value-of select="concat(@id,':',@start,'-',@stop)"/>
	<xsl:text>)
	</xsl:text>
	<xsl:apply-templates select="DNA"/>
	<xsl:text>
	</xsl:text>
	</xsl:template>

	<xsl:template match="DNA">
	<xsl:call-template name="prot">
	<xsl:with-param name="s" select="translate(text(),'atgc','ATGC')"/>
	</xsl:call-template>
	</xsl:template>

	<xsl:template match="DNA">
	<xsl:call-template name="prot">
	<xsl:with-param name="s" select="translate(translate(normalize-space(text()),'atgc','ATGC'),' ','')"/>
	</xsl:call-template>
	</xsl:template>

	<xsl:template name="prot">
	<xsl:param name="s"/>
	<xsl:if test="string-length($s)>2">
	<xsl:variable name="codon" select="substring($s,1,3)"/>
	<xsl:choose>
	<xsl:when test="$codon = 'ATT'"><xsl:text>I</xsl:text></xsl:when>
	<xsl:when test="$codon = 'ATC'"><xsl:text>I</xsl:text></xsl:when>
	<xsl:when test="$codon = 'ATA'"><xsl:text>I</xsl:text></xsl:when>
	<xsl:when test="$codon = 'CTT'"><xsl:text>L </xsl:text></xsl:when>
	<xsl:when test="$codon = 'CTC'"><xsl:text>L</xsl:text></xsl:when>
	<xsl:when test="$codon = 'CTA'"><xsl:text>L</xsl:text></xsl:when>
	<xsl:when test="$codon = 'CTG'"><xsl:text>L</xsl:text></xsl:when>
	<xsl:when test="$codon = 'TTA'"><xsl:text>L</xsl:text></xsl:when>
	<xsl:when test="$codon = 'TTG'"><xsl:text>L</xsl:text></xsl:when>
	<xsl:when test="$codon = 'GTT'"><xsl:text>V</xsl:text></xsl:when>
	<xsl:when test="$codon = 'GTC'"><xsl:text>V</xsl:text></xsl:when>
	<xsl:when test="$codon = 'GTA'"><xsl:text>V</xsl:text></xsl:when>
	<xsl:when test="$codon = 'GTG'"><xsl:text>V</xsl:text></xsl:when>
	<xsl:when test="$codon = 'TTT'"><xsl:text>F</xsl:text></xsl:when>
	<xsl:when test="$codon = 'TTC'"><xsl:text>F</xsl:text></xsl:when>
	<xsl:when test="$codon = 'ATG'"><xsl:text>M</xsl:text></xsl:when>
	<xsl:when test="$codon = 'TGT'"><xsl:text>C</xsl:text></xsl:when>
	<xsl:when test="$codon = 'TGC'"><xsl:text>C</xsl:text></xsl:when>
	<xsl:when test="$codon = 'GCT'"><xsl:text>A</xsl:text></xsl:when>
	<xsl:when test="$codon = 'GCC'"><xsl:text>A</xsl:text></xsl:when>
	<xsl:when test="$codon = 'GCA'"><xsl:text>A</xsl:text></xsl:when>
	<xsl:when test="$codon = 'GCG'"><xsl:text>A</xsl:text></xsl:when>
	<xsl:when test="$codon = 'GGT'"><xsl:text>G</xsl:text></xsl:when>
	<xsl:when test="$codon = 'GGC'"><xsl:text>G</xsl:text></xsl:when>
	<xsl:when test="$codon = 'GGA'"><xsl:text>G</xsl:text></xsl:when>
	<xsl:when test="$codon = 'GGG'"><xsl:text>G</xsl:text></xsl:when>
	<xsl:when test="$codon = 'CCT'"><xsl:text>P</xsl:text></xsl:when>
	<xsl:when test="$codon = 'CCC'"><xsl:text>P</xsl:text></xsl:when>
	<xsl:when test="$codon = 'CCA'"><xsl:text>P</xsl:text></xsl:when>
	<xsl:when test="$codon = 'CCG'"><xsl:text>P</xsl:text></xsl:when>
	<xsl:when test="$codon = 'ACT'"><xsl:text>T</xsl:text></xsl:when>
	<xsl:when test="$codon = 'ACC'"><xsl:text>T</xsl:text></xsl:when>
	<xsl:when test="$codon = 'ACA'"><xsl:text>T</xsl:text></xsl:when>
	<xsl:when test="$codon = 'ACG'"><xsl:text>T</xsl:text></xsl:when>
	<xsl:when test="$codon = 'TCT'"><xsl:text>S</xsl:text></xsl:when>
	<xsl:when test="$codon = 'TCC'"><xsl:text>S</xsl:text></xsl:when>
	<xsl:when test="$codon = 'TCA'"><xsl:text>S</xsl:text></xsl:when>
	<xsl:when test="$codon = 'TCG'"><xsl:text>S</xsl:text></xsl:when>
	<xsl:when test="$codon = 'AGT'"><xsl:text>S</xsl:text></xsl:when>
	<xsl:when test="$codon = 'AGC'"><xsl:text>S</xsl:text></xsl:when>
	<xsl:when test="$codon = 'TAT'"><xsl:text>Y</xsl:text></xsl:when>
	<xsl:when test="$codon = 'TAC'"><xsl:text>Y</xsl:text></xsl:when>
	<xsl:when test="$codon = 'TGG'"><xsl:text>W</xsl:text></xsl:when>
	<xsl:when test="$codon = 'CAA'"><xsl:text>Q</xsl:text></xsl:when>
	<xsl:when test="$codon = 'CAG'"><xsl:text>Q</xsl:text></xsl:when>
	<xsl:when test="$codon = 'AAT'"><xsl:text>N</xsl:text></xsl:when>
	<xsl:when test="$codon = 'AAC'"><xsl:text>N</xsl:text></xsl:when>
	<xsl:when test="$codon = 'CAT'"><xsl:text>H</xsl:text></xsl:when>
	<xsl:when test="$codon = 'CAC'"><xsl:text>H</xsl:text></xsl:when>
	<xsl:when test="$codon = 'GAA'"><xsl:text>E</xsl:text></xsl:when>
	<xsl:when test="$codon = 'GAG'"><xsl:text>E</xsl:text></xsl:when>
	<xsl:when test="$codon = 'GAT'"><xsl:text>D</xsl:text></xsl:when>
	<xsl:when test="$codon = 'GAC'"><xsl:text>D</xsl:text></xsl:when>
	<xsl:when test="$codon = 'AAA'"><xsl:text>K</xsl:text></xsl:when>
	<xsl:when test="$codon = 'AAG'"><xsl:text>K</xsl:text></xsl:when>
	<xsl:when test="$codon = 'CGT'"><xsl:text>R</xsl:text></xsl:when>
	<xsl:when test="$codon = 'CGC'"><xsl:text>R</xsl:text></xsl:when>
	<xsl:when test="$codon = 'CGA'"><xsl:text>R</xsl:text></xsl:when>
	<xsl:when test="$codon = 'CGG'"><xsl:text>R</xsl:text></xsl:when>
	<xsl:when test="$codon = 'AGA'"><xsl:text>R</xsl:text></xsl:when>
	<xsl:when test="$codon = 'AGG'"><xsl:text>R</xsl:text></xsl:when>
	<xsl:when test="$codon = 'TAA'"><xsl:text>*</xsl:text></xsl:when>
	<xsl:when test="$codon = 'TAG'"><xsl:text>*</xsl:text></xsl:when>
	<xsl:when test="$codon = 'TGA'"><xsl:text>*</xsl:text></xsl:when>
	<xsl:otherwise><xsl:text>?</xsl:text></xsl:otherwise>
	</xsl:choose>
	<xsl:call-template name="prot">
	<xsl:with-param name="s" select="substring($s,4)"/>
	</xsl:call-template>
	</xsl:if>
	</xsl:template>


	</xsl:stylesheet>

view raw translate.xsl hosted with ❤ by GitHub

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 10.2 years ago by Pierre Lindenbaum 166k

Entering edit mode

That XML sheet doesn't seem to translate it in the correct reading frame. Based on the UCSC genome browser, the correct amino acid sequence for mm10 chr3:93396405,93396500 is QDSPHRGQK...

I'm trying to get the amino acid sequence that displays in the UCSC genome browser itself.

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 10.2 years ago by abhikdeora • 0

Entering edit mode

That's because the DAS segment starts with 93396406:

>translate(chr3:93396406-93396500)
QDQSPHRGQKGRQDQSPHQGQKGRQDQSPHR
lindenb@okazaki:~$ xsltproc jeter.xsl "http://genome.ucsc.edu/cgi-bin/das/mm10/dna?segment=chr3:93396406,93396500"

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 10.2 years ago by Pierre Lindenbaum 166k

Entering edit mode

so if you want to know the amino acid translated for the gene at this position, the stylehsheet won't work. You should work with the knownGene table to get the position of the exon .

ADD REPLY • link 10.2 years ago by Pierre Lindenbaum 166k

Entering edit mode

for example: Is A Genome Position In An Exon Or Intron?

ADD REPLY • link 10.2 years ago by Pierre Lindenbaum 166k

Entering edit mode

So are you saying that there is no direct way to download the amino acid sequence for a given region? That's hard to believe, considering that it is displayed in the genome browser.

That's all I'm looking for. A direct way to download the amino acid sequence for a given region. I'd rather avoid MySQL table lookups.

ADD REPLY • link 10.2 years ago by abhikdeora • 0

Entering edit mode

10.2 years ago

Maximilian Haeussler ★ 1.8k

Can you explain what you're trying to do? It doesn't make a lot of sense to try to get an amino acid for a random piece of DNA. If you need the amino acid sequence, you usually first click on a transcript. The resulting page has a link for the protein sequence.

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 10.2 years ago by Maximilian Haeussler ★ 1.8k

Entering edit mode

Oh, I start to understand: you're looking at an exon. You can see the amino acid sequence shown on the screen. But if you click on the exon, all you can get is the full amino acid sequence of the whole transcript, not the little piece that you have on the screen.

Can you still explain a little bit more what the final point of this would be? I struggle with finding a use case for this, where this particular function could be useful...

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 10.2 years ago by Maximilian Haeussler ★ 1.8k

Entering edit mode

I'm a rising college freshman with very little bioinformatics experience (although I do have significant programming experience), so please bear with me.

I have a spreadsheet of single nucleotide mutations at certain positions in the mm10 genome (e.g. chr11:3133305 C->A). I'm trying to determine whether those mutations: 1) occur in a coding region of the genome, 2) yield a change in amino acid, and 3) determine what that change is.

The process by which I'm thinking of accomplishing that is this: download the nucleotide and amino acid sequence in a small range around the mutation, determine the reading frame of the DNA sequence, and from there determine the codon that the mutation occurs in. From there, it is trivial to find the change in amino acid resulting from the mutation.

I already have a Java program written that accomplishes the above given the nucleotide and amino acid sequence - I just need a way to download these sequences for a given region.

As I said earlier, I do have very little bioinformatics experience, so I would welcome suggestions for better ways to accomplish my goal.

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 10.2 years ago by abhikdeora • 0

Entering edit mode

Use the UCSC VAI, it does exactly that.

It seems that the pgSnp format is the easiest in your case, just convert your table to this format:

http://genome.ucsc.edu/FAQ/FAQformat.html#format10

1) upload your table as a custom track here http://genome.ucsc.edu/cgi-bin/hgCustom

2) go to the VAI http://genome.ucsc.edu/cgi-bin/hgVai

3) select your custom track and click "get results"

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 10.2 years ago by Maximilian Haeussler ★ 1.8k

Entering edit mode

That seems perfect, thanks a lot! I'll definitely use it.

For the sake of having the question answered, is there a way to download the amino acid sequence for a region in plain text? I'd hate for someone who needs that and arrives at this thread to not find an answer.

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 10.2 years ago by abhikdeora • 0

Entering edit mode

There is nothing I know of. You can click on a transcript and get the full amino acid sequence but not the current slice in view, at least not that I know...

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 10.2 years ago by Maximilian Haeussler ★ 1.8k

Entering edit mode

I have a spreadsheet of single nucleotide mutations at certain positions in the mm10 genome (e.g. chr11:3133305 C->A). I'm trying to determine whether those mutations: 1) occur in a coding region of the genome, 2) yield a change in amino acid, and 3) determine what that change is

So use something like Ensembl VEP: http://www.ensembl.org/info/docs/tools/vep/index.html

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 10.2 years ago by Pierre Lindenbaum 166k