Question

Genome-scale missense variant effect predictors?

0

Entering edit mode

4.7 years ago

predeus ★ 2.1k

Hello all,

I was wondering are there any really good new tools that predict missense effect in non-model species, on a genome scale? Provean seems to only predict one protein at a time - which is a very poor option for 10-20k proteins with current NR database size (that would have to be loaded into the memory every time).

There seems to be SIFT4G software, but it seems abandoned, and have not been the easiest to understand as well. EDIT: on the other hand, sift4g actually works really well - it was just confusing to figure out between various tools they have around it. Core sift4g utility operates on similar principles as provean/sift/polyphen, but is much faster due to custom alignment, CUDA usage, and good memory management.

Have there been some good progress in the field? I would appreciate any suggestions and pointers.

provean sift polyphen missense sift4g • 1.1k views

ADD COMMENT • link 4.7 years ago by predeus ★ 2.1k

score 0 · Answer 1 · 2020-04-07

0

Entering edit mode

4.7 years ago

Mensur Dlakic ★ 28k

I am not sure there is anything out there that scales to your needs. In case you haven't tried it, Missense3D may be of interest. Beware that it works with structures, so it is not fast by definition. The same is true for Rosetta.

ADD COMMENT • link 4.7 years ago by Mensur Dlakic ★ 28k

1

Entering edit mode

Turns out there is lots of sequence-based tools I didn't know about - see here. From what I can tell by briefly inspecting their main pages, all of them are based on position-specific features extracted from PSI-BLAST or similar searches. That also doesn't scale well with tens of thousands of proteins.

If I were to start this project, I'd try to develop a classifier based on some approach that can generate position-specific features quickly, as that appears to be the bottleneck. Something like this or this could work.

ADD REPLY • link 4.7 years ago by Mensur Dlakic ★ 28k

0

Entering edit mode

Yes, there are many sequence-based tools, but not too many are geared (and calibrated) towards prediction of effects on a whole genome scale. That's what I meant. Thank you for your help though.

ADD REPLY • link 4.7 years ago by predeus ★ 2.1k