Question

recombination and low mutation rate can potentially confound phylogenetic signal?

0

Entering edit mode

4.1 years ago

2001linana ▴ 40

I was reading an article "Nextstrain: real time tracking of pathogen evolution" and encounter this as the following, "there is a growing need for surveillance of non-influenza viruses, and Nextstrain is able to be extended to most outbreaks with readily accessible genomic data, although we note the potential for recombination or low mutation rate to confound phylogenetic signal." Based on my understanding, if the mutation rate is high, then the phylogenetic signal would be more obvious, and on the other hand, if the mutation rate is low, then the phylogenetic signal would be more hidden. How about recombination then? How to understand recombination? Is it a one-time event, which would influence the phylogenetic signal in a way? Any comments are greatly appreciated.

sequence gene • 1.0k views

ADD COMMENT • link 4.1 years ago by 2001linana ▴ 40

0

Entering edit mode

Thank you for your kind reply. I was wondering, is there any bioinformatics / computations involved research topics concerning recombination (SARS-CoV-2) that you may know ?

ADD REPLY • link 4.1 years ago by 2001linana ▴ 40

score 1 · Answer 1 · 2020-11-11

That's right - a sufficient amount of genetic diveristy is required to differentiate phylogenetic taxa/samples, however, "too much" divergence can also introduce artefacts. One common example is long-branch attraction, when using an outgroup that is too divergent from the assumed "ingroup"[1]. Highly divergent data can also lead to inaccurate recombination estimations[2].

In RNA viruses, recombination occurs during replication where the template strand is swapped out for another one before its completed (copy choice mechanism)[3]. This produces a recombinant. Traditional phylogenetic models assume that one sample has one origin. A recombinant has more than one evolutionary origin and can violate this assumption. Depending on what genomic region you're looking at, it could be more similar to a distantly related genotype and distort the topology and branch lengths of phylogenies[4]. Failing to account for recombination has also been shown to affect other evolutionary inferences[5].

Recombination is definitely not a one-time event. Similar to mutation rate, different organisms have different ranges of recombination rates. Generally, positive-sense single-stranded RNA viruses (coronaviruses are one) have a lower recombination rate, whereas viruses like Influenza and HIV are highly recombinant. The interplay of recombination and mutation helps generate genetic diversity in a viral population[6].

To emphasise how recombination isn't a one-off occurrence - historically, studies have elucidated recombination between genotypes, lineages or hosts. A recent SARS-CoV-2 study identified ancient recombination events to help understand the evolutionary origins of the virus[7]. More recently, deep-sequencing of viruses have shed light on recombination that occurs between viruses within a single host. It's still quite difficult to analyse within-host data due to the high similarity between sequences[8], somewhat similar to the issue with low mutation rate confounding a phylogenetic signal.

Overall, recombination remains biologically and computationally complex. As the Nextstrain team have alluded to, further development of methods are required to account for recombination between highly similar sequences, as much as being scalable to process the immense amount of sequences available (e.g. ~ 60,000 global SARS-CoV-2 sequences[9]).

Hopefully some of this is helpful! I've included some references throughout for further reading :)