Question

get protein sequence from ID

0

Entering edit mode

9.7 years ago

arronslacey ▴ 320

Hi - could someone point me to a database that I could query to find the protein sequence when given a protein ID (i.e P51811). I'm familiar with doing some queries on the UCSC server, but there are so many tables!

Thanks very much.

ANSWER

Thanks you to @themysticgeek and @Emily_Ensembl for pointing the uniprot REST API - I had forgotten. On this recomendation I botched this little bash script together to get the sequences from a list of IDs. Hope it helps anyone (feel free to optimize the code if you want!)

#!/bin/bash

#download fasta seqs given file of uniprot ids

file=$1
name=$2

list=$(cat ${1})

mkdir ${name}
cp ${2} ${name}
cd ${name}

for word in ${list}
do
    wget -nv http://www.uniprot.org/uniprot/$word.fasta
done

SNP sequence protein gene • 4.4k views

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by arronslacey ▴ 320

1

Entering edit mode

9.7 years ago

Emily 24k

Have you tried putting your IDs into the search box in Uniprot?

Alternatively, if you have a long list of them that you want sequences for and they're all Uniprot IDs like this one, you could try BioMart. There's a help video to get you started here. Use:

Database, Ensembl genes

Filters, ID list limit, pick Uniprot from the dropdown and paste in your list

Attributes, Sequences, protein sequences

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by Emily 24k

Ram · Accepted Answer · 2015-03-04

4

Entering edit mode

9.7 years ago

kautilya ▴ 430

P51811 is a Uniprot ID . In order to get its sequence you can simple use the URL http://www.uniprot.org/uniprot/YOUR_PROTEIN_ID.fasta

e.g http://www.uniprot.org/uniprot/P51811.fasta

Besides this there are a large number of other ways to access the sequence - the details of these can be found at the Uniprot REST guide.

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by kautilya ▴ 430