Where to download a curated dataframe of the full archive of UniProt?
2
0
Entering edit mode
2.8 years ago
johnnytam100 ▴ 110

I want to download a curated dataframe of the full archive of UniProt.

Any format as long as it is a table (.csv, .tsv, SQL etc.).

Solution 1: Download the .fasta and do the formating myself -> very slow + information incomplete.

Solution 2: Use API like BioServices -> still very slow but more information.

Do we have an already-curated dataframe containing as complete as possible the entry + metadata of UniProt that can be downloaded right away?

uniprot • 1.2k views
ADD COMMENT
2
Entering edit mode
2.8 years ago
Michael 55k

The closest you get is UniProtKB's Download page. If you download SwissProt and TrEMBL databases you will get a lot of data. I don't think a public complete database dump of UniProtKB exists containing everything from the underlying SQL databases. I further do not think loading everything into a large dataframe would make much sense, it would be redundant and likely not fit into memory where R objects reside. The gzip compressed TrEMBL data are already 140GB to download.

ADD COMMENT
1
Entering edit mode
2.8 years ago
Eugenio ▴ 10

you can also set a direct connection via ftp ftp://ftp.uniprot.org/

ADD COMMENT

Login before adding your answer.

Traffic: 1688 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6