Hello,
I have a list of proteins with their uniprot IDs from an MS experiment (with no quantitative data). From this list, I should explore protein-protein interaction network using STRING database. However, there are some proteins that String could not identify ("Sorry, STRING found no proteins by this name in Homo sapiens"). I tried to find other possible IDs that these proteins may have in other databases such as gene cards, ensembl,... but still String could not find these proteins.
However, instead of finding these proteins, String identified some similar proteins and or paralogs of these proteins.
For example, I have these four proteins:
protein name in my dataset ---> String output
Q5JXB2 (UBE2NL) ---> P61088 (UBE2N)
P0CG22 (DHRS4L1) ---> Q9BTZ2 (DHRS4)
Q5T1J5 (CHCHD2P9) ---> Q9Y6H1 (CHCHD2)
P0C7P4 (UQCRFS1P1) ---> P47985 (UQCRFS1)
Now, I am highly confused that what I have to do in this situation, whether I should remove these four proteins from my dataset in order NOT to include them in the analysis (as String can not identify them and I have no choice), or keep them and accept String recognition of them as UBE2N, DHRS4, CHCHD2, UQCRFS1, or there are other ways to deal with this condition but I am not aware of.
I was wondering if you could help and guide me what is the best that I can do in this situation. Any advices and suggestions are highly appreciated.
Many thanks.
Best wishes, Farah
Hi Damian, thank you so much for your great explanation and clarification. Now my assumption is that, from my proteomic analysis, I should remove those genes which are either processed transcripts or pseudogenes, and only keep protein coding genes. As STRING also does not contain transcripts or pseudogenes, and therefore could not identify them. I was wondering if you could also let me know about other cases. For example, in my MS list, there are two NEDD4L proteins with two different IDs which I do not know which one I should keep or remove from the dataset, and what "(fragment)" means in K7ENS6.
Also, for PRSS2, STRING returns PRSS3P2 which seems to be a different protein.
I would highly appreciate your great help. Best, Farah