Hello all,
I was wondering are there any really good new tools that predict missense effect in non-model species, on a genome scale? Provean seems to only predict one protein at a time - which is a very poor option for 10-20k proteins with current NR database size (that would have to be loaded into the memory every time).
There seems to be SIFT4G software, but it seems abandoned, and have not been the easiest to understand as well.
EDIT: on the other hand, sift4g
actually works really well - it was just confusing to figure out between various tools they have around it. Core sift4g
utility operates on similar principles as provean/sift/polyphen, but is much faster due to custom alignment, CUDA usage, and good memory management.
Have there been some good progress in the field? I would appreciate any suggestions and pointers.
Turns out there is lots of sequence-based tools I didn't know about - see here. From what I can tell by briefly inspecting their main pages, all of them are based on position-specific features extracted from PSI-BLAST or similar searches. That also doesn't scale well with tens of thousands of proteins.
If I were to start this project, I'd try to develop a classifier based on some approach that can generate position-specific features quickly, as that appears to be the bottleneck. Something like this or this could work.
Yes, there are many sequence-based tools, but not too many are geared (and calibrated) towards prediction of effects on a whole genome scale. That's what I meant. Thank you for your help though.