I have a list of proteins which are identified by what the author of the list called "Protein Accession Number".
When I look for this number in the ncbi protein search I do find the protein.
For example: The protein accession number 29436380 gives as a result the "MYH9 protein" and in the address of the website I see that the number is there: https://www.ncbi.nlm.nih.gov/protein/29436380
I have looked all over but I couldn't find what this identifier really is and I would like to use the list that I have in R with other bioconductor packages. For that I would need to match it with uniprot IDs but I can't find any package to do this, mainly because I don't know how this number identifier is called.
Can anyone help me by either pointing me to a package that can do this mapping or telling me the name of this "accession number"?
Not entirely sure, but maybe biomaRt might be useful to you, documentation for this : https://www.bioconductor.org/packages/devel/bioc/vignettes/biomaRt/inst/doc/biomaRt.html. In your case you might need to play around with this package and find out if there is an "attribute" in biomaRt for GIs (just like there is for ensembl_gene_ids, uniprotswissprot ids for example). If yes, then you can query ncbi proteins using those list of GIs (I have a feeling that you might also need the sequence accession, but I might be wrong) and get meta or sequence info.
Thanks a lot manaswwm!!! You pointed me in the right direction. What I finally did was to scrap the ncbi site to download the relevant information. I still have the problem that some records were removed or are obsolete but there is no way round that other than manually getting the information.
Here I leave you the code I used to retrieve the relevant information in case it can help someone.
You only need to pass the GI accession number to the "extractInfo" function and voila!! I have Vectorized the function because I needed to use it in a dplyr pipe operator.
it looks like a GI accession (accessions with only numbers) - more on them : https://www.ncbi.nlm.nih.gov/genbank/sequenceids/
Not entirely sure, but maybe biomaRt might be useful to you, documentation for this : https://www.bioconductor.org/packages/devel/bioc/vignettes/biomaRt/inst/doc/biomaRt.html. In your case you might need to play around with this package and find out if there is an "attribute" in biomaRt for GIs (just like there is for ensembl_gene_ids, uniprotswissprot ids for example). If yes, then you can query ncbi proteins using those list of GIs (I have a feeling that you might also need the sequence accession, but I might be wrong) and get meta or sequence info.
Thanks a lot, you pointed me in the right direction!!!