Entering edit mode
15 months ago
rj.rezwan
▴
10
Hi, I made scaffolds using ragtag
with query assembled_genome.fasta
and the reference genome
which is already published. So now I made the dot plot for the comparison of two genomes. The reference genome has low BUSCO value than our assembled genome. I have attached the dot plot here. Please let me know how to check that these inversions are real? Or they are artifact from ragtag scaffolding?
Most likely ragtag would not scaffold a region in the opposite orientation by mistake so most likely the assembly contains an inversion. However if your new scaffolded assembly is made using short reads, I would be careful about accepting the structure as truth. A quick thing to do is to break up the scaffolds whilst maintaining the order and then regenerate the dotplot so you can easily see the defined contig edges and the inverted region.
contigs are assembled using the hifi long reads. is there a chance may be that repeat region in the genome may cause ragtag to make the missing or inversion in that portion. Becasue as I have mentioned that BUSCO of the reference is quite less which is 93% and our assembled genome has the BUSCO 97%. So may be there is a complexity while comparison and may cause missingness and inversions because majority of the inversions are after the scaffold breaks which may make sense that the comparison genome is not appropriate to be used as a reference here.
In my experience, a lower BUSCO is not generally impacted by repeat regions so I don't think that is related. Usually in terms of Long-read assemblies it is due to accuracy (and requires polishing) or some contigs have been removed after assembly.
I am not sure I understand "majority of inversions are after the scaffold breaks", however, as I suggested before, I think you should evaluate if you have assembled contigs that contain alignment to the reference in both senses, therefore capturing at least one edge of the inversion. You can also see if this is captured in the reads.