Question

Forum:Why are people not working on gene prioritization and candidate gene selection for new experiment design?

0

Entering edit mode

6.2 years ago

prasadhendre ▴ 20

Selection of a good candidate genes sets solid ground for biological experimentation and systems and synthetic biology but why scientists are not developing a good approach for gene prioritization (GP). Existing tools really don't use GOs, pathways into GP, they focus primarily on PPIs. Are we not loosing a lot other available information?

gene-prioritization • 2.1k views

ADD COMMENT • link updated 18 months ago by Ram 44k • written 6.2 years ago by prasadhendre ▴ 20

3

Entering edit mode

That's a bit of a false premise in your head-line, because some researchers and groups are certainly working with gene prioritization, but let's not get distracted by that.

ADD REPLY • link 6.2 years ago by Michael 55k

1

Entering edit mode

I think just using DGE results and selecting genes based on fold-change is not the most appropriate method. Literature search always lead to selection of KNOWN candidates, with very little to discover further. Most of these tools are PPI based and don't really give good guide for selection. It is very difficult to understand why first is first and last is last. Shouldn't there be a method which gives handle like p-value for selection of genes and if possible use all the available information in terms of GO, pathways and interaction?

ADD REPLY • link 6.2 years ago by prasadhendre ▴ 20

1

Entering edit mode

99% of the time we only care about known and well characterized genes. Few labs want to dedicate the resources needed to characterize little-known genes, they have other goals.

ADD REPLY • link 6.2 years ago by Devon Ryan 104k

2

Entering edit mode

99% of the time we only care about known and well characterized genes.

I think that is the core of the problem. I remember there was a paper saying that we are still working on pretty much the same gene set as before the 'genomics' era.

ADD REPLY • link 6.2 years ago by Michael 55k

2

Entering edit mode

No disagreement there, we need to fund characterization of a broader set of genes, there's a lot of known unknowns out there.

ADD REPLY • link 6.2 years ago by Devon Ryan 104k

1

Entering edit mode

This is the paper you are thinking of: https://academic.oup.com/bioinformatics/article/34/12/2087/4816110

ADD REPLY • link 6.2 years ago by GenoMax 147k

1

Entering edit mode

Wasn't this paper https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2006643

ADD REPLY • link 6.2 years ago by Lluís R. ★ 1.2k

0

Entering edit mode

Thanks this is hot-hot, just published!!

ADD REPLY • link 6.2 years ago by prasadhendre ▴ 20

1

Entering edit mode

Yes, just having the results of differential expression is not the best. A differential expression analysis is just a basic and standard analysis, from my perspective. However, I disagree with you that we should be using GO ontologies and pathways because, at least from my perspective, they introduce more noise and confusion into an analysis. This is independently verified in my conversations with numerous researchers. One colleague Professor in London referred to DAVID as 'pseudo-science'.

What comes after a differential expression analysis is follow up studies and predictive model building. I note that a substantial portion of researchers lack the ability to see the positive impact on society of their research (or they just do not think about it because they do not have to); in this way, some research groups have appeared 'directionless' and may aim instead to constantly seek out novel things to do, for publication purposes, as opposed to really aiming to make a difference and conferring a positive impact on society with their research.

'Machine learning' and AI are further algorithms that are chucking further noise into biological data... like bad fashion trends (like those purposefully torn and colour-faded jeans).

ADD REPLY • link 6.2 years ago by Kevin Blighe 88k

0

Entering edit mode

I don't think GOs and pathways introduce noise, just that we don't know how to reduce the noise. Non-gene centric classification systems are difficult to interpret and extrapolate to gene level. ML and AI are further atypical science for a 'statistical' biologist.

ADD REPLY • link 6.2 years ago by prasadhendre ▴ 20

0

Entering edit mode

This comment was meant for the next comment.

ADD REPLY • link 6.2 years ago by prasadhendre ▴ 20

score 3 · Answer 1 · 2018-09-24

3

Entering edit mode

6.2 years ago

Devon Ryan 104k

Honestly, few people are using any tools other than the output of the their DE analysis and maybe a bit of pathway/GO enrichment plus a solid understanding of the literature. For most basic science researchers, it's questionable if anything that doesn't involve training a computer on the relevant literature would be useful, since determining what follow-up experiments to do is pretty much a task that only humans should be doing at this point (unless they're in the highly limited context of doing drug screens or something like that).

ADD COMMENT • link 6.2 years ago by Devon Ryan 104k

0

Entering edit mode

That's why scientists revolve around the same genes again and again.

ADD REPLY • link 6.2 years ago by prasadhendre ▴ 20

0

Entering edit mode

Correct, it's a known issue that won't be changed until funding structures are altered.

ADD REPLY • link 6.2 years ago by Devon Ryan 104k

score 3 · Answer 2 · 2018-09-24

Let me play the devil's advocate, as much as I understand Devon's sentiment.

"Can a Machine do better in selecting candidates than a human?"

Human decision making is not based on objective criteria, as hard as we may try, but on subjective preference, like 'interest', and in particular relies on 'homology'. One could argue that humans should not be allowed to select candidate genes for a routine test pipeline, when there are many more potential candidates than slots in the wet-lab validation pipeline.

In many cases we are trusting machines (aka. algorithms) to make essential decisions on a set of candidates. Imagine you are looking for information on the web, would you visit all documents and inspect them visually/manually? No! We let Google rank the documents for us, and often we trust the result (only ever using the first page or so), possibly too much (as we do not know the intricacies of its algorithm). There are many areas where machine learning, data mining is or will be applied in the future, think about automatic ranking of applicants, matching of partners, etc..

Formalizing selection criteria into an algorithm could be particularly useful to make the biological assumptions explicit, more than the subsequent ranking process itself.

Missing Objective Criteria and Gold Standard

Candidate selection or gene prioritization with respect to what? The problem is possibly the lack of objective criteria, and also for most cases, the objectives will be different. Examples:

Rank candidates for causative mutations
Rank candidates for vaccine targets

Most likely you will not be able to use the same algorithm or heuristics for both.

but why are scientists not developing a (Me: as in single) good approach for gene prioritization (GP).

Because a single best approach will not work for different settings because there is not a single scoring function that solves all issues. As in any regression or machine learning application, the choice of features is crucial, and often more so than the choice of algorithm (SVM, random forest, linear regression, etc.)

Then, how to score your genes and evaluate that ranking is optimal?

A possible approach could be to define a very limited setting, e.g. prioritize the genes that will yield a phenotype on knock-out. This is in particular important in a setting where the ability of testing candidates is limited to a fraction of all genes, like with some non-model organisms (e.g. the salmon louse we are working with).

A little simplified example: We need new drug targets in Lepeophtheirus salmonis because it's an important salmon parasite, but we can maybe test 100 genes per year, if we wanted to knock-down all ~13k genes at least once, we would need 130 years. We need to prioritize our selection process to increase the number of successful knock-outs, if we don't get a phenotype out of an experiment we havne't really learned anything. If we simply choose the core metabolic or other lethal genes, these are often highly conserved and therefore maybe not so interesting (the host and other crustaceans in the sea have them possibly also). We need to rescue the genes that are unique or have little or no homology, while often people will pick genes that are well annotated by homology, and those without homology are understudied.

Currently, it seems that using network-based approaches based on gene expression data, we can outperform human- expert-based selection by a vast margin, but then this is hard to prove, because one needs to make experiments to validate these candidates.