Comparing Assemblies and Breaking Inconsistent Contigs
0
0
Entering edit mode
6.6 years ago
EarlyEvol ▴ 30

I have three assemblies from a species with an estimated genome size of 360. The assemblies were generated with Abyss(PE/MP), MaSuRCa(PE/MP/PacBio) and DBG2OLC(PE/PacBio). According to BUSCO analysis, their gene content is not completely overlapping, so I would like to merge them to capture more of the genome and potentially boost the contiguity of the final assembly in the process. When merging with two rounds of Quickmerge (masurca+(abyss+DBG2OLC)), the n50 does increase a ton, from aby:90, d2o:86, and mas:250 (respectively) to ~500kb for the final assembly. This seemed positive until I assessed the BUSCO content of the final assembly. C=complete, S=single_copy, D=duplicated,F=fragmented, M=missing and n=total_number

Abyss: C:94.5%[S:93.1%,D:1.4%],F:3.3%,M:2.2%,n:1658

DBG2OLC: C:92.6%[S:91.9%,D:0.7%],F:2.2%,M:5.2%,n:1658

QM(abyss+DBG2OLC): C:94.0%[S:90.4%,D:3.6%],F:1.4%,M:4.6%,n:1658

MaSuRCa: 97.0%[S:95.1%,D:1.9%],F:0.7%,M:2.3%,n:1658

QM(masurca + QM(abyss+DBG2OLC)): C:96.1%[S:87.0%,D:9.1%],F:0.8%,M:3.1%,n:1658

Two odd things are happening. 1) BUSCOs that were previously identified in the query assembly, (Abyss for the first round) are being lost during merging. After talking with Mahul,the quickmerge developer, I think this is due to missassemblies in either the abyss or DBG2OLC assemblies. When alignments prematurely end, the rest of the remaining contig is clipped. 2) The more assemblies that are merged, the greater the duplicated BUSCO content. The sequencing reads came from a very very inbred culture. Barring there actually being 1/10 duplicated BUSCOs, I think this is also due to missassemblies to start with.

Sorry, that was a lot of background for my question. Is there a tool to identify inconsistencies between two (or 3) draft assemblies? It really seems like something that would have been written, but I have searched around a bunch, and cant find anything. It think it wouldn't be too hard to write a script that would pull this info from a mummer delta file, then it would be easy to extract broken up contigs with bedtools.

Any help in this would be super duper appreciated, and would save me a bunch of time writing this code which most likely exists (and might not help anything)

Thanks, Earl

genome quickmerge masurca DBG2OLC abyss • 1.5k views
ADD COMMENT

Login before adding your answer.

Traffic: 1509 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6