Genome Assembly From Large Insert Libraries
1
0
Entering edit mode
11.2 years ago
Rahul Sharma ▴ 660

Dear all,

I am trying to assemble a genome of size around 80Mb. I have four Illumina libraries of insert sizes 300bps, 1kb, 8kb and 12kb. Read lengths are 76-100bps. I generated assemblies using both Velvet and ALLPATHS_LG assemblers. I could generate the nice N50 more than 2MBs in case of velvet (K-mer=55) and ALLPATHS-LG generated N50 of around 1MB. Assembly parameters are looking nice. But I am having around 20% of Ns in the assembled scaffolds in case of velvet and around 16% in case of ALLPATHS-LG assembled scaffolds. My questions would be:

(a) Is this usual with such a long insert libraries? (b) Should I turn off the scaffolding of these assemblers and try scaffolding by other stand-alone scaffolders like BAMBUS2, SSPACE or GRASS (Please suggest more)? (c) Dose these assemblers also mask repeat elements while scaffolding/assembly process, which have been masked in the genome and I am getting high percentage of Ns?

I would really appreciate the suggestions.

Kind regards and wishes,

Rahul Sharma

velvet scaffolding illumina • 2.9k views
ADD COMMENT
1
Entering edit mode
11.2 years ago
cts ★ 1.7k

sounds to me like you don't have enough coverage from the reads or that many of them are duplicates meaning that although you can link many contigs due to the long insert sizes those contigs can't actually be extended to fill in the gaps. Might be worth checking out the coverage/duplications rates of the reads.

ADD COMMENT

Login before adding your answer.

Traffic: 1727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6