Question

Forum:What are some areas of bioinformatics that are overdone?

3

Entering edit mode

4.4 years ago

Jeremy Leipzig 23k

What are some areas of bioinformatics in which people keep dishing out new papers on algorithms, overly complex statistical models, and tools where the old ones were good enough? What are areas where only marginal gains are to be found in terms of accuracy or speed, but people keep publishing because it's what they are familiar with?

bioinformatics • 1.6k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 4.4 years ago by Jeremy Leipzig 23k

0

Entering edit mode

Until one devises a new tool and compares it to the current crop, one would not know that it is not significantly better. By that time one may as well go ahead and publish (if possible) so someone else can avoid re-inventing that method.

While publications continue to be one of the main criteria for judging academic achievement for promotions etc not much one can do about this.

ADD REPLY • link 4.4 years ago by GenoMax 152k

0

Entering edit mode

I remember seeing a recent tweet stating that there are more methods papers for single-cell sequencing than actual research being done...

ADD REPLY • link 4.4 years ago by WouterDeCoster 48k

score 1 · Answer 1 · 2021-02-23

1

Entering edit mode

4.4 years ago

Istvan Albert 102k

I would steer away from predicting that any field only hold "marginal" gains.

Where I find that surprisingly little progress is made is when it comes to validation and properly documenting the strengths and weaknesses of various existing methodologies.

Take any tool that can assign RNA-Seq reads transcripts of a gene. How well do they work? Some isoforms may be very similar to others, other isoforms are easy to tell apart. But there is no way to tell which assignments are reliable which counts are more trustworthy than the others.

Here is another domain that I found to be surprisingly ill-documented in this respect. Take any tool that does metagenomic classification (Kraken, Centrifuge, Qiime etc) now run classification on data on just some specific species. What I see is that some species will be affected by major systematic errors that make the classification for that species alone incorrect (all the while the other counts are fine). On aggregate the method "works" (except all the errors come from a few species).

Same with differential expression, there is no method to check if the statistical methods are appropriate. People choose deseq or edger just because one seems to work "better" ... in the end an unscientific approach.

What I'd like is a tool that tells me, hey for this particular transcript, species, DNA etc the results you'll get are not so great.

ADD COMMENT • link 4.4 years ago by Istvan Albert 102k

0

Entering edit mode

so the Genome Comparison & Analytical Testing (GCAT) toolkit on steroids?

ADD REPLY • link 4.4 years ago by Jeremy Leipzig 23k

0

Entering edit mode

well ... GCAT, has become a cautionary tale of even bigger problems ... the published site URL is not operational anymore:

http://www.bioplanet.com/gcat-benchmarking-tool/

if I Google "Genome Comparison & Analytical Testing (GCAT) " then, for me, the first hit leads to a site that attempts to install malware ...

ADD REPLY • link 4.4 years ago by Istvan Albert 102k

0

Entering edit mode

Arpeggi got bought by Gene-by-Gene who then got bought by Family Tree DNA, but GCAT was pretty much a killer variant calling benchmarking app that generated a ton of interest. It would be interesting to know which benchmarks would elicit a similar turnout today (it might still be variant calling).

ADD REPLY • link 4.4 years ago by Jeremy Leipzig 23k