What Are The Obsolete Research Topics In Bioinformatics?
12
28
Entering edit mode
13.2 years ago
Sirus ▴ 820

Hi all,

Well is seems maybe strange to ask about obsolete research topics in bioinformatics, but I think for a beginner or a new researcher it is a helpful thing to know obsolete topics. What I mean by obsolete is research topics which are not of big interest.

• 14k views
ADD COMMENT
7
Entering edit mode

whatever this years list is, reappraise it in 5 years - when this years obsolete list is big news

ADD REPLY
2
Entering edit mode

Interest in topics that are not interesting is indeed strange ;)

ADD REPLY
2
Entering edit mode

I think this should be community wiki probably--no right answer.

ADD REPLY
14
Entering edit mode
13.2 years ago

The first thing that comes to my mind is protein secondary structure prediction. It seems pretty clear to me that no big performance improvements have happened in the last 5 years (if not 10 years). It seems to have reached a plateau already years ago, so making a new secondary structure prediction method would seem an obsolete research topic.

ADD COMMENT
5
Entering edit mode

On the other hand there have been plenty of new tools to predict protein 'un'structure. Disprot, IUPred, RONN, etc all predict (or catalogue) structural properties of proteins and are picking up quite a few citations. Admittedly it is the lack of secondary structure they are predicting but isn't it kinda still the same?

ADD REPLY
5
Entering edit mode

Actually, from the CASP (http://predictioncenter.org/) results of the last years, there's definitely been an upwards trend... I think the field is still somewhat alive. Not to forget the docking and interaction-prediction areas...

ADD REPLY
13
Entering edit mode
13.2 years ago
lh3 33k
  1. Short-read (<50bp) alignment algorithms. We will not need to align short reads as sequence reads keep getting longer. It is interesting to see how this subfield gets nearly saturated and then fades away in only two years. A few other NGS related methods will be ended up like this. Algorithm development is data driven, while NGS is moving too fast.

  2. I used to do a brief review a few years ago about clustering algorithms which are mostly related to microarray data and homology identification. My impression that time was we do not need a new microarray data clustering algorithm (homology clustering may still have some room). Nonetheless, I do not really work with microarray data, so could be wrong on this point.

ADD COMMENT
4
Entering edit mode

Reads < 50bp might still be used for microRNAs libraries for example. These read-lengths will therefore still be used. Nonetheless I don't know if there is a huge needs for new algorithms for this peculiar kind of data.

ADD REPLY
3
Entering edit mode

Short-read alignment can be useful for other applications as well, such as in primer design. But the current tools are sufficient.

ADD REPLY
11
Entering edit mode
13.2 years ago

A second unrelated obsolete research topic just occurred to me: development of improved microarray normalization methods. Mind you, I fully appreciate the importance of good normalization methods - they can make a huge difference in terms of what one can get out of a microarray study.

However, I consider it obsolete to develop new microarray normalization methods for two reasons:

  1. The methods seem to have reached the limit of what is possible. Methods have for 10 years been able to correct for just about any systematic biases in microarray data, be that the amount of sample loaded, dye effects, print tip effects, and other spacial effects.

  2. Microarrays are to a large extend being replaced by RNA-seq methods, which only adds to obsoleteness of developing new methods to normalize the data.

ADD COMMENT
5
Entering edit mode

I agree with the normalization problem being moot. I strongly disagree with "Microarrays are to a large extend being replaced by RNA-seq methods". This idea is perpetuated by software people, and hardcore technophiles. While microarray software development is clearly being replaced by RNA-seq development, this hardly applies to the application level (where results count, not coolness). I would estimate the number of microarrays over RNA-seq experiments to still be at 1000:1. This might change, though. (Disclosure: I earn money with analysing microarray data)

ADD REPLY
1
Entering edit mode

Agreed, microarrays are not becoming completely replaced by RNA-seq. Still, their importance has decreased due to the competition from RNA-seq :-)

ADD REPLY
0
Entering edit mode

for thoes interested in this topic, here is a paper that might be interesting: www.stat.berkeley.edu/tech-reports/800.pdf

ADD REPLY
6
Entering edit mode
13.2 years ago
Mary 11k

It might be an interesting exercise to check out the original Pedro's List (some of you will know what that is) and consider which tools have persisted, which evolved, and which have vanished. Of course, some of the vanished ones will have been funding-related and not topic-specific deaths.

ADD COMMENT
6
Entering edit mode
ADD REPLY
0
Entering edit mode

Yes, I have been in this field this long....

ADD REPLY
0
Entering edit mode

Old days remembered, and there are some Gopher site links too in the Pedro's page. How about looking into some old bioinformatics book like Methods in Enzymology Vol. 183(1982) and 266(1997)? The first book on Bioinformatics I read in the fall of 2000, when I joined an University course was "Sequence Analysis in Molecular Biology: Treasure Trove or Trivial Pursuit" by Gunnar Von Heijne(1987)

ADD REPLY
0
Entering edit mode

There is also the first NAR Database issue in the mid-90s, and some of the subsequent ones. Another blast from the past: NCBI in the mid-90s I found on the Wayback Machine once: http://blog.openhelix.eu/?p=2577 1997. Oy.

ADD REPLY
5
Entering edit mode
13.2 years ago
Stephen 2.8k

What about methods for multiple testing correction in genome-wide association studies (GWAS)?

On one hand, a p-value of 5e-8 for "genome-wide significance" is the de facto standard for GWAS, based on some work showing that testing all common variants results in about 1 million effectively independent tests. Furthermore, if I'm following up findings in replication studies or in the lab, I'm going to take my top x% of genes/variants, based on whatever I can afford, not whatever the corrected p-value happens to be.

But on the other hand I have attended the IGES meeting for the last 5 years, and there are always several talks and many posters on new methodology for correcting p-values for multiple testing in GWAS.

ADD COMMENT
2
Entering edit mode

It won't be irrelevant. There will always be people who want to make sure their stats are correct, and statisticians interested in working on problems like this.

There may be lots of work published that doesn't bother, but the same can be said for pretty much all of molecular biology.

ADD REPLY
4
Entering edit mode
13.2 years ago

I don't think people are working on "protein music" anymore :-)

Ross D. King and Colin G. Angus
PMā€”Protein music
Comput Appl Biosci (1996) 12(3): 251-252 doi:10.1093/bioinformatics/12.3.251

http://bioinformatics.oxfordjournals.org/content/12/3/251.full.pdf+htm

"We present the program PM for the analysis of protein sequence information using audification..."

ADD COMMENT
1
Entering edit mode

There is a recent one from India here :) http://sites.google.com/site/achushome/other-interests#gene-music. You can download the track of Protein music in Kalyani there.

ADD REPLY
1
Entering edit mode

Well maybe obsolete for proteins, but not for genome alignments:

"ComposAlign was developed to sonify large scale genomic data. The resulting musical composition is based on Common Music and allows the mapping of genes to motives and species to instruments. It enables the researcher to listen to the musical representation of the genomewide alignment and contrasts a bioinformatician's sight-oriented work at the computer." (2009)

http://www2.bioinf.uni-leipzig.de/ComposAlign/examples/

ADD REPLY
0
Entering edit mode

:-) .

ADD REPLY
0
Entering edit mode

Sounds funny :), I wonder how the hemoglobin protein music will look like

ADD REPLY
1
Entering edit mode
13.2 years ago
  1. Optimal pairwise global alignment, see Needleman and Wunch (1970) for solution.

  2. Optimal pairwise local alignment, see Smith and Waterman (1981) for solution.

ADD COMMENT
1
Entering edit mode

Maybe because finding optimal solutions for pairwise alignment is an obsolete topic... Since the massive data arose, "optimal" is now often replaced by "the lesser evil" !

ADD REPLY
1
Entering edit mode

I remember hearing a couple years ago about using specialized hardware (FPGAs/GPUs?) to accelerate these alignments so that they might be feasible so some work might still be done on the implementation side of things and might possibly work for small genomes.

ADD REPLY
1
Entering edit mode
13.2 years ago
Bach ▴ 550

I suppose that in-silico gene prediction (from a sequence as start) for prokaryotes will be going the way of the dodo pretty quickly because

  1. there are already a couple which do the job "good enough"
  2. the ever growing databases at EMBL and NCBI provide a good BLAST base so that assignment is done more and more on similarity.
  3. doing an RNA profiling is comparatively cheap nowadays and gives much better accuracy on the transcript boundaries, from which with a bit of curation one should be able to get pretty good gene boundaries
ADD COMMENT
1
Entering edit mode
13.1 years ago

Protein secondary structure prediction using a 3-letter alphabet is pretty dead, but there may be some life left in developing more detailed views of local protein structure.

RNA secondary structure prediction still needs a lot of work, particularly for the more unusual RNA structures that involve things other than pairing on the Watson-Crick edge.

In-silico prediction of prokaryote genes is still far form a solved problem. People have mostly punted on the short proteins, and the RNA genes are still woefully underannotated. The archaeal gene predictions are particularly bad.

ADD COMMENT
0
Entering edit mode
13.2 years ago
Woa ★ 2.9k

How about RNA secondary structure prediction ? Is the topic 'hot' anymore?

ADD COMMENT
0
Entering edit mode

I think it's going to become a "hot" topic again thanks to the discovery of long non-coding RNA's.

ADD REPLY
0
Entering edit mode

That is a good example for ups-and downs in popularity. I agree with GWW, whenever one discovers non-coding transcripts this becomes interesting. I had a similar problem discussed here (ofc that doesn't really make it a hot topic automatically ;) )

ADD REPLY
0
Entering edit mode

It's not dead yet. Predicting RNA secondary structure of mRNAs has also received recent interest (including from my lab).

ADD REPLY
0
Entering edit mode

And what about predicting pseudoknots? or RNA-RNA hybridization? These are still alive.

ADD REPLY
0
Entering edit mode
12.8 years ago

De novo gene prediction and genome annotation based on conservation, GC content, HMMs, etc.

These kind of methods were hot when sequencing ESTs and full-length transcripts was prohibitively expensive. Now you can just sequence the transcriptome and let the data tell you what regions of the genome are being transcribed and what the exon/intron structures of those transcripts are (both canonical and minor isoforms).

As RNA-seq and related assembly methods improve, annotating a new genome will become increasingly routine and automated. A lot of genome and cDNA 'finishing' tasks have already been eliminated by these new types of data and related analysis pipelines.

ADD COMMENT
0
Entering edit mode
12.8 years ago
Pals ★ 1.3k

The dogma of disordered (regions of) proteins. It seems to be a very interesting yet important topic in structural bioinformatics. Although there are a number of programs, I was unable to find a reliable one.

The importance of this area has been highlighted in Breaking the Protein Rules.

ADD COMMENT

Login before adding your answer.

Traffic: 2012 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6