Entering edit mode
6.5 years ago
mks002
▴
220
Hello All, I tried assembly of one of plant genome (size 2.8 Gb) using Masurca. My concern is masurca predicted a ~2.9 Gb assembly and when assembly was over it resulted ~3.5 Gb.
And the assembly N50 values is ~100kb . can anyone suggest if we can still improve the assembly and get the desired assembly size.
Any suggestion is appreciable.
Hi msk002
Can you please add more details like what data did you have (i.e coverage, data size, platforms; hybrid?)
Of course, there are scope for improvements. What is the minimum scaffold size? How many gaps are there? What plant is that? Are there too many repeats in the genome?
Update: I have updated the title of your post.
Thanks Vijay Lakhujani we are working with 130x data, with Illumina WGS data(110X), 3 Mate pair data (5X), Pacbio data (13X). So around 1000 million Wgs reads and 40 gb of pacbio reads.
On assembly stats
Repeat masking on this assembly resulted ~13% of repeats. I think repeats will on higher side. My concern is genome size which should be either around ~2.9 Gb or 3.1 Gb and N50 of ~500 kb
I am looking for solution as i am short on time.
Those comma's are on strange positions in that table, is that a formatting issue?
How (or why) do you conclude you should get an N50 of 500kb?
Do you have any idea about the level of heterozygosity of that genome? or the ploidy state of the species?
Hello lieven.sterck . Thank you for your reply. Comma's are just given for counting 10's and 100's digits. Yes I was hoping for 500 kb N50 as I have performed another non-hybrid assembly having around 12 % of N's.
It's a diploid species .
Hello Vijay Lakhujani, can u detail what improvement approaches are there.
You could try to get an estimate of the genome size by analyzing Kmer freq plots (eg. using the genomescope website). How certain are you of that given size?
For genome size, in lab estimated genome came around 2.9 Gb. And masurca assembler too predicted around 2.9 Gb at intial assembly step. And my non-hybrid approch resulted in 2.9 Gb of genome but with 12% gaps.