Denovo genome assembly
1
1
Entering edit mode
5.3 years ago

Hi,

I have been trying to assemble a genome that, based on flow cytometry, should be between 1.2 to 1.5 GB. I have Illumina (more than 100x), Pacbio (50x) 10x and Hic reads. The bests assembly that we have got so far is 450 MB and 93 percent complete. It seems that the assembly collapses despite having all the newest sequencing technology data. We have tried CANU, MASURCA, FALCON, SuperNova, and MINIassem. I should mention that the flow cytometry suggests that the genome is a diploid but bioinformatic analysis suggests both diploid and tetraploid. So, my question is, does anyone have any idea why the genome assembly collapses and is there any assembly software available which can handle all these different types of reads and perform better assembly which I might not be aware of? Any suggestions and directions are greatly appreciated.

All the bests, Pezhman Safdari

genome assembly • 1.4k views
ADD COMMENT
0
Entering edit mode

Genome sizing by flow citometry is an approximation as ploidy. But also, the assembly would depend on the genome complexity (at sequence level), even if you have 100X in all the technologies, definitely it is not a guaranty to obtaining a complete genome sequence.

ADD REPLY
0
Entering edit mode

polyploidy will certainly influence your assembly result. Have you checked (did you analyse the data to see if it might indeed be polyploid and if so, how did you do it?).

An interesting and quite straightforward approach to estimate genome size (and even polyploidy to some extent) is to make those Kmer-frequency plots. A useful website for this is genomescope .

If it turns out to be polyploid, go an have a look in the literature to see how other people have tackled this (eg. cotton genome, soybean, wheat, .... ). I now there is also specific software around to assembly highly heterozygous genomes, I think platanus is one of them and Falcon-unzip (from the top of my head to be honest)

ADD REPLY
0
Entering edit mode

What do you mean by 93 percent complete (BUSCO complete? BUSCO complete + fragmented)?
One of the best approach nowadays is pacbio+Hi-C with Falcon-phase. Then you can correct with illumina. Otherwise did you try scaffolding you pacbio assembly with your 10X data?

ADD REPLY
0
Entering edit mode
5.3 years ago
Sam ▴ 20

Have you tried these combinations correct and PacBio reads with CANU Hybrid assembly ( corrected pacbio + illumina) using Masurca polish with pilon if necessary

There is also a concept called meta-assembly using the different assembly from multiple assemblers to creat best one follow the link for more info https://www.nature.com/articles/s41588-018-0110-3#Sec3

ADD COMMENT

Login before adding your answer.

Traffic: 2307 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6