I have a dataset of proteins that I have blasted against the uniprot-swissprot database.
I'd now like to identify which proteins are likely to have a mitochondrial sub-cellular localisation based on the sub-cellular localisation of their best blast hit in the swiss-prot database.
The fasta headers of the uniprot proteins look like this:
">sp|Q64602|AADAT_RAT Kynurenine/alpha-aminoadipate aminotransferase, mitochondrial OS=Rattus norvegicus GN=Aadat PE=1 SV=1"
I have found a gene ontology mapping file (link below) but the fasta headers don't contain the GO IDs necessary to map them. ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/external2go/uniprotkb_sl2go
Is there some intermediate file that I need to use and does anyone know where to find it? Any help would be appreciated.
Thanks, that's awesome.
I changed the protocol to HTTPS, otherwise the response could be empty, because Uniprot move to https and sends a document moved header.
yes I saw it this morning ! :-D https://github.com/lindenb/jvarkit/commit/5f6b66bc05201d2d543e1b1214640dd5c84051f8