Entering edit mode
3.4 years ago
mzzzzzzzzzz
▴
40
Hi all, I'm a newbie in genome assembling. I have long reads sequence, and I already finished running assembler, polishing and then scaffolding. Next step I guess is to manually curate the assembly? How do I know where I have gaps in the assembly?
For example, I have a quast figure like this for part of one chromosome aligned against reference genome: Is the blank part around 13Mb a huge gap? I find quite some this kind of blank regions in my scaffolds, and I assume that it's not a gap. If this is true, then how can I see where the gap is and what genes are in the gap region?
Without anymore information it is impossible to tell whether this gap in you alignment is due to a deletion or a scaffold. This would be highly dependent on how you performed the scaffolding.
One initial step may be to visualise your genome against the reference using a dotplot prior to scaffolding. \ Maybe look at this very easy to use tool called Dgenies \ This will make it clear if you are looking at gaps between contigs or within contigs
In terms of manually curating you can look at reducing gaps between contigs with gap joining tools, orientate the contigs to your reference, remove redundant contigs if heterozygous, etc
Thanks a lot for replying me! As you suggested, I used Dgenies to generate the dot plot by aligning my assembly against the reference genome. Below is the alignment of chr1-chr4 (x axis) from left to right. The x axis is the reference and the y axis is contigs from my assembly.
I have some questions about this dot plot. (1) How do I know whether chromosome 1 and 2 have gaps? (Quast assessment shows no gaps in all of my contigs. Does this imply that there is no gap in chromosome 1 and 2, as they are formed by only one contig each?) (2) How should I understand the yellow lines in contig 1 (the bottom contig) appear in chr2 and chr4? I think the yellow lines indicates the gap in the reference genome? (3) There is a contig (in black) in chr3 that are completely vertical, which means it is highly identical to the reference but can not be aligned to the chr3 in reference? I have difficulty to understand this point...
Also, how can I manually orientate my contigs after aligning to the reference? I tried minimap2 for alignments, but I can't open the alignment file in IGV. Is there any better ways to do it?
Thanks a lot in advance!
Very interesting \ So if these are your raw contigs (i.e. no scaffolds/Ns present) than as you said; it implies there is no gap in chr1 or chr2 and perhaps even chr4 (assuming they are in order from left to right and reference is on the bottom and your denovo assembly is on the sides). \ So you should have a look into understanding what the dotplot is showing but what I think the yellow lines are suggesting is that in your reference genome there are mainly gaps within regions containing a large number of tandem repeats. It appears with your assembly, thanks to the long reads, you have been able to assemble through the several repeat regions. This is why this region extends vertically in your assembly, and then these repeat regions appear to contain homology within each chromosome hence the yellow lines. They are probably something like a centromere considering each chromosome contains one and their similarity across chromosomes. \
Myself, to manually orientate I use scaffolding tools such as ragout or ragoo.
Thank you, Samuel!