Hi Biostars,
This question is based on a question by user56:
"Would you like to have a reference sequence - would it be a healthy/desirable genome? "
It got me thinking again about an argument I've had with some scientists. I would be interested in your opinions, but i do not want to move the conversation away from user56's original question.
I am working on a family of highly related DNA viruses. The viruses with clinical implications have been isolated and sequenced roughly 30years ago. Obviously, sequencing wasn't trivial in those days, and the fact that those sequences have stood the test of time as good as they have is a testament to how careful the original researchers were. However, we now know that several of these "prototype viruses" have errors in them.
I am a proponent of using all available data (additional sequences, biological information,...) to correct these genomes. E.g. a frame-shift in an essential protein is likely to be "real". My argument is that in order for scientists to be able to communicate it is essential to have a "reference genome" which looks and feels normal (i.e. all ORFs are where they should be, evolutionary it all makes sense...).
This way it is easier to map candidate SNPs on this (artificial) genome.
However, several (older) scientists appear hesitant to accept changes to these genomes.
I would be interested to hear what this community thinks about correcting reference genomes in order to provide a (bioinformatic) framework.
I do understand the resistance. The weird thing is that different labs are using different "Gold Standards" completely confusing everyone. I recently published a review in which I systematically "fix" genomes. Most people are positive, but not all...
just stick with it and "advertise" it at any opportunity you get - takes a few years for something new to get adopted - by the way can you post the link to your paper? I am intrigued
database: pave.niaid.nih.gov paper: in press @ nucleic acid research