Dear friends,
Despite the past few years we saw more and more data have been generated, and now days it is general to use genetic tests for patients., also lots of companies started working on these type of data even companies like google.
I wanted to ask this question in order to share information and using experiences/perspective for our local region.
Do you think just analyzing raw data or low scale data would be continued in the next few years as wee see like now or we will be faced with other area of science. Like :
1- Using Deep learning (or other learning) methods for extracting patterns from genomic/transcriptome data.
2- Using NGS results for drug research / repositioning.
thanks.
Two words:
Long reads.
Two more:
Hybrid assemblies.
*Though one could make the case that that is now rather than the next few years,
I do agree. Long reads are game-changing in more than one way.
For most people bioinformatics today is a narrow concept, it really means dealing with the various constraints that billions of short reads impose on us. What changes when we have few but long reads? ... Well ... everything.
right off the bat, BAM does not even work, a standard that five years later cannot store the data it was meant to standardize this is what the "premature optimization the root of all evil" quote is about
Then interpreting alignments, tuning alignments (something we almost never need to do today) becomes critical.
The entire basis of alignments they way we do it today becomes obsolete. Alignments are narrow and abstract mathematical concepts that don't work well for long reads where the sequences have a biological function. Alignment scoring should be dynamic (and depend on the information content of the region) rather than uniform rewards and penalties. We can already see the effects of introducing convex penalties (minimap2 vs bwa mem), the minimap2 long read alignments are major step in the right direction (yet mathematically bwa mem is "more" correct), but that is just the beginning, there is a lot more to go there.
But now all of a sudden the traditional dynamic programming algorithms Needleman Wunsch and Smith-Waterman become outdated - but guess what every aligner uses these, now we need entirely new breeds of algorithms and aligners.
Great, so we will new type programming in aligners.
That's if you doing NGS for biology. If you are doing NGS for medicine, the field would probably be dominated by single-cell RNA-seq.
if we are talking about future (perhaps further than few years even) I would even be tempted to drop the "two more".
very long reads, making assembly obsolete
I was thinking that too, but I think less for assembly issues and more for accuracy. The short read techniques are still the go-to for low error, while the long reads make for nice easy assembly.
agreed!
though they'll likely resolve that too in the future ;)
I do also agree with predeus 's remark , I'm pretty biased towards my own field of research and indeed other fields will have a different view on this.
A similar post
Where and how NGS techniques are heading for the next 5 years?