How do you get the amino acid position of a particular protein from the UCSC browser. The link http://bit.ly/hiGIMu shows the Methionine in green. How to find the amino acid position of this residue?
How do you get the amino acid position of a particular protein from the UCSC browser. The link http://bit.ly/hiGIMu shows the Methionine in green. How to find the amino acid position of this residue?
You can use the UCSC mysql server to get the positions of the exons.
mysql -h genome-mysql.cse.ucsc.edu -A -u genome -D hg18 -e \
'select * from knownGene where name="uc001opa.1"\G'
*************************** 1. row ***************************
name: uc001opa.1
chrom: chr11
strand: +
txStart: 69165053
txEnd: 69178423
cdsStart: 69165262
cdsEnd: 69175231
exonCount: 5
exonStarts: 69165053,69166979,69167780,69171942,69175066,
exonEnds: 69165460,69167195,69167940,69172091,69178423,
proteinID: P24385
alignID: uc001opa.1
Here your protein contains 5 exons (the first exon starts at 69165053 and ends 69165460 -1 . It also contains the first translated base at at 69165262.
Using the reference sequence for 'chr11', you then 'just' have to walk over each exon to translate your protein until you've found your amino acid.
UPDATE: even easier. There is a table named knownGenePep containing the peptide for a given KnownGene.
mysql -h genome-mysql.cse.ucsc.edu -A -u genome -D hg18 -e 'select * from knownGene as K, knownGenePep as P where P.name=K.name and K.name="uc001opa.1"\G'
*************************** 1. row ***************************
name: uc001opa.1
chrom: chr11
strand: +
txStart: 69165053
txEnd: 69178423
cdsStart: 69165262
cdsEnd: 69175231
exonCount: 5
exonStarts: 69165053,69166979,69167780,69171942,69175066,
exonEnds: 69165460,69167195,69167940,69172091,69178423,
proteinID: P24385
alignID: uc001opa.1
name: uc001opa.1
seq: MEHQLLCCEVETIRRAYPDANLLNDRVLRAMLKAEETCAPSVSYFKCVQKEVLPSMRKIVATWMLEVCEEQKCEEEVFPLAMNYLDRFLSLEPVKKSRLQLLGATCMFVASKMKETIPLTAEKLCIYTDNSIRPEELLQMELLLVNKLKWNLAAMTPHDFIEHFLSKMPEAEENKQIIRKHAQTFVALCATDVKFISNPPSMVAAGSVVAAVQGLNLRSPNNFLSYYRLTRFLSRVIKCDPDCLRACQEQIEALLESSLRQAQQNMDPKAAEEEEEEEEEVDLACTPTDVRDVDI
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
To answer that question you'd have to say, "position in a particular protein sequence". After alternative splicing or alternative start sites (for instance) you can have many different answers to this question. There are usually one or more canonical reference sequence(s); maybe you should narrow the question.
In this case, it is uc001opa.1 (CCND1) length=295 Thanks for pointing it out.