I pulled all of the human lung infecting coronaviruses on the ncbi viral variation database and ran a multiple sequence alignment using MUSCLE
. There was ~200 or so human lung infecting fasta files that went into this alignment and it took about 2 hours to process. I cannot get much further than this as I am not a virologist and I only dabble in bioinformatics.
Here is the link to the ncbi viewer with the results: https://www.ncbi.nlm.nih.gov/projects/msaviewer/?key=4VJ7i_1QIomOfmyOrW9acArNB-NYklaXWpFyh2aDdK3ln9aglZpH6JhNrY3UrlXTBMtZ30f7HP5b5E_pSe9F9HfdStNm70w,qhkwwLYbacLFNSfF5iQRO0GGRIwb_RX4Gf4x6CXsN8Km8JXP1fXtS_zdyfevS_s2qi73Oukeshv1AeEM5wrrEdk45DbICuI
How should one attempt to glean biological insight from these data? Zoom in on certain protein coding regions and see whats going on? Compare similarity to previous strains of the virus and compare and contrast biological course of disease?
It looks like there are many regions of similarity between the covid-19 genome (top) and some of the MERS coronaviruses from ~2010. The red lines are SNPs, I presume.
Thanks!
Yeah I am just wondering where to look next. I got the Springer 'Comparative Genomics' book and they have a short section on viral bioinformatics. They pretty much just tell you to run a multi sequence alignment and do not expand on the subject past that.
I'm assuming that since the viruses I used have large regions that are similar, but the wuhan virus has many SNPs, insertions, and deletions, the MERS and covid-19 viruses have a common ancestor but the new virus is actually quite different. I think it might be more interesting to run a MSA with all of the genomes sequenced with this current virus and see how they change over time/location. I am not sure how many of those are available yet but I will see.
This is more the kind of study that I am familiar with in virology. But you don't do a MSA. You have one reference, in this case SARS-CoV-2 (covid-19). And every time you sequence new data from a patient for example, you map the reads and call variants. And there are a few variants of interest that you need to keep an eye on. As you said before that you do this to educate yourself, making or using a variant calling pipeline can be a next step.