I have sequencing data of a few samples of a DNA genome virus
. I'd like to learn de novo assembly
of the short reads
, making scaffolds from it, and then counting the abundance
of each strain in the data. I heard about SPAdes
as a good choice for these kinds of very short genomes. and also BBmap
for statistics related to contigs.
I am a complete newbie in everything related to de novo assembly. So, I wonder if you could recommend some good courses/tutorials/papers
for learning all the steps, including units, special vocabularies, analysis steps, and everything else.
I have some experience with analyzing RNA-seq
and also small RNA-seq
data. I learned these two using free online resources, however, it is very hard to find even one full workflow post/paper on genome assembly
.
Note: I know that I should read the documentation on SPAdes
and BBmap
and I will. However, I'd like to first get to a good idea of what are the steps, test it on some sample data, and then go through learning different tools. So, it is like saying that I need to learn RNA-seq analysis not read a particular aligner
documentation.
cross-posted in bioinformatics.stackexchange.
ARCTIC Consortium: https://github.com/nf-core/viralrecon
Methods tools used by Nextrain Project: https://nextstrain.org/docs/getting-started/introduction#open-source-tools-for-the-community
Resources from CDC: https://github.com/CDCgov/SARS-CoV-2_Sequencing#bioinformatics
Thank you so much for the links. The first one is very interesting. I'll read their github repo, read their paper, and maybe even try it using some sample data.
Please don't cross post. It's annoying.
Sorry. It is much harder to get an answer in bioinformatics sites compared to stackoverflow! I won't do it again
There are fewer bioinformaticians (and fewer still with the exact domain knowledge) than CS folks, so yeah, it takes longer. Please have patience and keep trying to find answers by yourself in the meantime.
I'd like to add this very nice post myself.