Let's say I have Illumina short-reads for a virus and its reference sequence is also available in the Database.
In this case what I should use to get a full-length genome from my Illumina short-reads?
- Genome assembly
- Or, short-read alignment for consensus generation.
Many times I get confused about when I should use which option.
Thanks in advance.
You can do a reference based de-novo assembly. See this review for more info: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1911-6
I would say this boils down mostly to what you intend to do downstream, and how good the reference is.
If the reference is a complete genome with lots of validation, and you only care about variant identification, I'd say just mapping the reads is the way to go.
If the reference isn't that good, or you intend to do further analysis, there's no harm in assembling the genome.
Of course, you can do both pretty quickly.
You can also consider that pipeline: A combined de novo assembly approach increases the quality of prokaryotic draft genomes