Hey everyone
denovo, , assembly, illumina, short read, plant, genome, beginner
I have a question regarding the de novo assembly of a plant genome.
So I have 13 Gb of paired-end short read (illumina) sequencing data (x10) for a diploid plant species. Not much is known about the genetics of the plant and my research hopes to elucidate some basic genome characteristics.
I want to generate a very basic de novo reference genome for future works but I do not need to annotate it. I just want to assemble it and use it for alignment if necessary. Currently I have already done this using Megahit to produce a reference file.
My question is, will this produce any problems downstream ? I know that Megahit is mostly used for metagenome assembly but sources state that it could be used for a basic assembly of a single genome as well and I interpreted that to include any short read sequencing data.
What do you guys think? Should I try another assembler ? If so, what would you guys recommend for low coverage plant genomes?
Lemonhope
Hi, so it seems you have an idea of the size of the genome, right? Do you also know if its a selfing species? The last assembler we successfully used with Illumina data was platanus, there's a nice tutorial here: https://bioinformaticsworkbook.org/dataAnalysis/GenomeAssembly/Arabidopsis/AT_platanus-genome-assembly.html
I do have an idea of the genome size yes, but I am unaware of whether the species is able to self or not.
Thank you for the recommendation!
Hey, Lemonhope here.
So maybe I was unclear in my original post, sorry, it was a bit late when I posted! My goal is to simply have a set of scaffolds that could be used for basic genome characteristic elucidation. I was wondering if I could use it to determine some basic genome characteristics such as BUSCO for an assessment of how many conserved genes could be identified. Additionally, I was also wondering if it could be used as a pseudo-reference to call SNPs, as I also have a large amount of ddRAD data that I could align to this genome.
My question is essentially: "would Megahit be an acceptable assembler granted these goals?"
Lemonhope