This may be less technical than most questions here but it's something I've been wondering about for some time.
I have studied biology and to a lesser degree bioinformatics (one Python bioinfomatics course). It seem to me the holy grail of bioinformatics would be to plug in a sequence and be able to predict all the proteins in the organisms and how they would be expressed. Basically the complete inter-conversion of phenotype and genotype in silico, which is of course physicality possible but technically challenging.
My question is to what extent this is possible today and what are the biggest hurdles to accomplishing this. Realistically I'm thinking about less impressive comparisons, can we compare the sequence of a black Labrador and a white Labrador and confidently say which genes or promoters are responsible for the difference in color? Obviously when we know which metabolic pathway to check it makes it easier but how difficult is it to subtract one genome from another and then assign the phenotypic effect of each of those differences?
It seems to me that machine learning and other clever techniques have been implemented to solve problems like image recognition, natural language processing and other problems where the inputs are significantly less friendly to computer processing than sequencing data is.
Is it an issue of computing power, programming, insufficient sequencing data or something else?
My 2p. About machine learning and image recognition, I'd point out that these application can rely on high quantity and quality of data to train the underlying algorithms. For example (I don' have numbers at hand), there must be millions of handwritten postcodes for which you know true answer and you can use them for the training, cross validation etc of image recognition methods. In biology it's much more difficult to generate such good training sets.