Question

Value Of Reference Genome (Based On Recent Post)

1

Entering edit mode

12.8 years ago

Whetting ★ 1.6k

Hi Biostars,
This question is based on a question by user56:

"Would you like to have a reference sequence - would it be a healthy/desirable genome? "

It got me thinking again about an argument I've had with some scientists. I would be interested in your opinions, but i do not want to move the conversation away from user56's original question.

I am working on a family of highly related DNA viruses. The viruses with clinical implications have been isolated and sequenced roughly 30years ago. Obviously, sequencing wasn't trivial in those days, and the fact that those sequences have stood the test of time as good as they have is a testament to how careful the original researchers were. However, we now know that several of these "prototype viruses" have errors in them.

I am a proponent of using all available data (additional sequences, biological information,...) to correct these genomes. E.g. a frame-shift in an essential protein is likely to be "real". My argument is that in order for scientists to be able to communicate it is essential to have a "reference genome" which looks and feels normal (i.e. all ORFs are where they should be, evolutionary it all makes sense...).

This way it is easier to map candidate SNPs on this (artificial) genome.

However, several (older) scientists appear hesitant to accept changes to these genomes.
I would be interested to hear what this community thinks about correcting reference genomes in order to provide a (bioinformatic) framework.

reference genome • 2.4k views

ADD COMMENT • link updated 12.8 years ago by Ido Tamir 5.2k • written 12.8 years ago by Whetting ★ 1.6k

score 2 · Answer 1 · 2012-09-25

2

Entering edit mode

12.8 years ago

Istvan Albert 102k

I do understand the origins of the resistance to change.

Correcting a reference is also implicitly altering one of the core values of that reference - the standardization that it provides. Soon there will be people using the old and new references - then arguments break out of what should and shouldn't be included, things split into even more directions and so on.

To get more people on board you would need to offer them radical improvements not just a better solution.

ADD COMMENT • link 12.8 years ago by Istvan Albert 102k

0

Entering edit mode

I do understand the resistance. The weird thing is that different labs are using different "Gold Standards" completely confusing everyone. I recently published a review in which I systematically "fix" genomes. Most people are positive, but not all...

ADD REPLY • link 12.8 years ago by Whetting ★ 1.6k

0

Entering edit mode

just stick with it and "advertise" it at any opportunity you get - takes a few years for something new to get adopted - by the way can you post the link to your paper? I am intrigued

ADD REPLY • link 12.8 years ago by Istvan Albert 102k

0

Entering edit mode

database: pave.niaid.nih.gov paper: in press @ nucleic acid research

ADD REPLY • link 12.8 years ago by Whetting ★ 1.6k

score 1 · Answer 2 · 2012-09-26

For larger genomes, this is standard practice. Builds are versioned and one bases ones analysis on a certain build. Genome annotations/build are also versioned. Its only a question of documentation (provenance), so the researcher needs to state which build he used in his analysis. You could start by labelling the old "gold standard" .0 - if there is one, and yours w1 or w201209.