Hello,
So the thing is I need to predict the Gene ontology for a dataset of sequences, aproximately one and a half millon sequences. This size is too big for the https://omabrowser.org/oma/functions/. Which would be the best way to use OMA for such a dataset?
not sure about the OMA approach (and this thus potentially not directly answering your question) but you can consider to run them through interproscan. That one will also assign GO labels to the input proteins, keep in mind though that running 1,5M proteins through interpro will also take a considerable amount of time.
Yes, hi Lieven, thanks for your answer. I am aware of Interproscan. Currently running my sequences there too. And yes... sadly you are right, it is taking some time