Hi, I am a newbie to bioinformatic analysis from a different background.
So I would like to know if my idea is totally dumb, or should I keep on it.
I wonder how it would be if we had stored the dna sequence data in pixels, giving different rgb values for different nucleotides. Wouldn't it be easier and user-friendly to store data, compare and find alignments using image processing tools? I imagine a standard coloring code for annotations which can be improved for further analysis. Finding similar sequences by overlapping images would be easy. I could use the machine learning algorithms to find patterns for distinguishing genes or non-coding sequences.
Does it make any sense to you or should I keep studying and don't waste my time for these ideas? :)
Thank you
One-hot encoding does this (or you can rethink your problem in this way): https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/
There's plenty of different ways data can be encoded/represented. However not all of them are equally useful. You need to ask what the benefits of the change of representation are. As you've found already, image representations of sequences have been tried before but they haven't really had an impact in the field so the purported advantages may not be that compelling. The machine learning field has been looking at applying their new shiny toys to as many fields as possible and bioinformatics is no exception. There were quite a few papers a couple of years ago using images representation of sequences to use with CNNs. Problem is the published papers I've seen don't use a state of the art bioinformatics approach for comparison (if they even do a comparison) so I am unimpressed.