Pfam annotations from UniProt
1
1
Entering edit mode
7.2 years ago

Hi,

I am trying to download the Pfam annotations for all the proteins present (predicted / known) in Uniprot. The main issue is I also need to download genome sequences and gene coordinates for all of these proteins, and it seems like there is no way to do this from UniProt.

Anyone has any relevant experience in the field?

thanks a lot!!

pfam uniprot • 5.1k views
ADD COMMENT
1
Entering edit mode

This is not possible with just core UniProtKB.

To get the locations of a Pfam hit one needs to combine UniParc with UniProtKB. To get the genomic locations of the genes encoding UniProtKB one can find this for one organism/proteome on our ftp site today ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/genome_annotation_tracks/

Otherwise you will need to map the proteins to the genome yourself e.g. via Ensembl.

ADD REPLY
0
Entering edit mode

which output do you need ?

ADD REPLY
0
Entering edit mode

Ideally: whole genome sequence, gene annotations for the genome with genomic coordinates, predicted Pfam domains for each predicted or observed protein (possibly with the coordinates of each domain hit).

ADD REPLY
0
Entering edit mode
7.1 years ago

Hi,

  1. "I am trying to download the Pfam annotations for all the proteins present (predicted / known) in Uniprot."

You can download proteins from uniprot with pfam ids.

a. First browse to Swiss-Prot (for reviewed entries) and organism (eg: Human)

b. click on "column" tab (situated above the displayed list) ; it gives multiple tabs and you can click on "Family and domain" and select "Pfam" and save.

c. Then, select all entries and click "Download" tab. you will get tab separated file, which includes pfam IDs.

Eg: "http://www.uniprot.org/uniprot/?query=*&fil=organism%3A%22Homo+sapiens+%28Human%29+%5B9606%5D%22+AND+reviewed%3Ayes"

  1. The main issue is I also need to download genome sequences and gene coordinates for all of these proteins, and it seems like there is no way to do this from UniProt.

You can map uniprot Ids to gene ID formats (ENSEMBL or refseq) in standard GTF file you have, there you get gene coordinates.

ADD COMMENT

Login before adding your answer.

Traffic: 2586 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6