Hi guys, I'm currently will be doing comparative genome analysis of human coronaviruses (including SARS-CoV-2, SARS-CoV and MERS-CoV). I will be including strains of each coronaviruses (along with their reference sequences as well) for my project for multiple sequence alignment. So far from what I knew is that the human coronaviruses strains are not that much, except SARS-CoV-2 strains which have over 10k+ now and I'm not capable to handle so many sequences, so may I ask how do I select the important strains of SARS-CoV-2 only?
I'm lost on how to retrieve sequences for SARS-CoV-2, all the retrieved sequences will be running multiple sequence alignment to identify the patterns and differences among the aligned sequences.
Any recommendations and opinions will be appreciated. Thank you.
See the tutorial on this very topic from NextStrain.org: https://nextstrain.github.io/ncov/