Hi,
I am trying to download the Pfam annotations for all the proteins present (predicted / known) in Uniprot. The main issue is I also need to download genome sequences and gene coordinates for all of these proteins, and it seems like there is no way to do this from UniProt.
Anyone has any relevant experience in the field?
thanks a lot!!
This is not possible with just core UniProtKB.
To get the locations of a Pfam hit one needs to combine UniParc with UniProtKB. To get the genomic locations of the genes encoding UniProtKB one can find this for one organism/proteome on our ftp site today ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/genome_annotation_tracks/
Otherwise you will need to map the proteins to the genome yourself e.g. via Ensembl.
which output do you need ?
Ideally: whole genome sequence, gene annotations for the genome with genomic coordinates, predicted Pfam domains for each predicted or observed protein (possibly with the coordinates of each domain hit).