Question

Viral co-infection strains decomposition

0

Entering edit mode

4.9 years ago

stefan.dvorezky ▴ 20

Hello everyone, I have BAM files of viral reads for SARS-CoV-2 and many variants interestingly have 2-3 big fractions (like C->T 40%, C->G 45% on the same position). I am not an expert in Virology and wonder where it comes from. The only explanation I have is the "multiple strains hypothesis" - i.e. it is either co-infection of multiple viral strains or co-development of strains already inside the host (or both?). In this case it would be great to separate these strains in silico, i.e. from one BAM to get FASTA of 1st putative strain in the host, 2nd putative strain and so on. Does anybody have an idea on algorithms/software/publications, that could have approached this?

rna-seq SNP next-gen virus SARS-CoV-2 • 1.1k views

ADD COMMENT • link updated 4.9 years ago by psjalma ▴ 10 • written 4.9 years ago by stefan.dvorezky ▴ 20

0

Entering edit mode

Were you able to find a method to use for your work? I am curious as I have encountered the same issues myself.

ADD REPLY • link 4.4 years ago by Tawny ▴ 180

score 1 · Answer 1 · 2020-06-17

That is interesting, I am not an expert on computational aspects of your question. However, if the quantities differ consistently and by significant numbers in terms of the predominance of the reads corresponding to two (or more) genotypes, (for example, strain#1 with 60% reads and strain#2 with ~40% reads, then it should be possible to reliably classify them and such positions can be placed in two columns by sorting the VCF file using an excel formula (or sorting the bases where the proportion ranges between 20-45% and those where the proportion ranges from 55-80%. In one of our paper, the bioinformatician colleagues had done exactly the same work so as to differentiate the two genotypes after experimental inoculation, you can refer the details (since the time of this work, I have moved to another Institute). https://pubmed.ncbi.nlm.nih.gov/29665434/

The SNPs which are present in both the strains would be present in over 95% of the reads and can be considered common to both (or all) the strains.

Indeed, I am looking for an analysis of a ~1000 bp amplicon sequencing where we want to see if there are mixed genotypes. I will appreciate your suggestions regarding the same.

Best wishes and regards and hope all of your remain healthy and safe wherever you are.

Pushpendra