Assembler for large genome de novo assembly with Illumina paired end reads of 150 pb
5
2
Entering edit mode
9.9 years ago
alecloic ▴ 40

Hello,

I need some advice.I never realized a genome assembly before. I have to make a de novo genome assembly on a large genome (2.5 gb) with short illumina paired end reads of 150 pb.

I inquired about the different assemblers but none match my needs. there is always a criterion that blocks (for example Abyss, ALLPATHS-LG and SOAPdenovo work with much shorter reads while others like Spades are not working for the genomes of this size).

Do you have an idea of what short-read de novo assembler I could use? which would give the best results?

cordially

A. GUYOMARD

France, Lyon

next-gen Assembly genome • 7.8k views
ADD COMMENT
3
Entering edit mode
9.9 years ago

Allpaths and Soap both work fine with 150bp reads. I don't know what the best assembler is for genomes of that size, though.

ADD COMMENT
0
Entering edit mode

not Allpaths, it requires at least two libraries, one paired-end and one mate-pair (see B1 and B3 in https://www.broadinstitute.org/software/allpaths-lg/blog/?page_id=215)

ADD REPLY
3
Entering edit mode

Actually, we use Allpaths routinely here with only one library. You can feed Allpaths the short library again instead of a long library, or you can assemble the short library with something like Velvet and generate synthetic LMP reads from the contigs, which is the approach we take. It seems silly, but it works and gives good results.

ADD REPLY
0
Entering edit mode

Thanks for your comment. Just curious.. did you control for misassemblies?

ADD REPLY
1
Entering edit mode

When testing a new assembler or assembly method, we use data of known organisms and run the assembly through Quast, which counts misassemblies, to verify that the approach is valid.

ADD REPLY
1
Entering edit mode

QUAST with a ref genome is indeed a very good approach to evaluate an assembly. If you had no or little misassemblies, then I'd be inclined to think it's fine.

I'd be curious to hear from Allpaths developers what they think of this usage of their tool.

ADD REPLY
0
Entering edit mode

Have you a script to generate synthetic LMP reads from a contig file to share please ? :)

ADD REPLY
2
Entering edit mode

You can use the BBMap package for that:

randomreads.sh ref=contigs.fa reads=1000000 out=lmp.fq paired interleaved len=150 mininsert=3600 maxinsert=4400

They come out in "innie" orientation; you can use reformat.sh with the rcomp or rcompmate flag to transform them to a different orientation if you need to.

ADD REPLY
0
Entering edit mode

You can feed Allpaths the short library again instead of a long library, or you can assemble the short library with something like Velvet and generate synthetic LMP reads from the contigs, which is the approach we take.

Is there a recommendation for which option might work better - short library again Vs. synthetic LMPs?

OR does it differ on a case by case basis, and if so, how does not determine which option might better serve one's genome assembly goals?

AND I wonder if ALLPATHS-LG, for a medium sized eukaryotic, haploid genome (~50MB), has been empirically shown to be any better or worse than a5miseq, or SPAdes, or ABySS. I'm comparing assemblers to pick one, but I've got to stop my comparative analyses to move on with the "chosen" one. Hence this question.

I hope you do not mind me tagging you two here: Rayan Chikhi, and Brian Bushnell. Thanks!

ADD REPLY
1
Entering edit mode

Actually, we don't do that anymore as far as I know :) I'm not sure if it's a good idea or not, or what the procedure was for validating that it did not lead to misassemblies (if any validation was performed). So if you do go that route, I suggest you validated it on genomes with finished references first.

We have extensively tested AllPaths versus other assemblers multiple times, but assembly results can be very version-specific and Spades especially has changed a lot since the last test.

Spades tends to be our best microbial assembler but I'm not sure how it does on fungi.

ADD REPLY
2
Entering edit mode
9.9 years ago
rtliu ★ 2.2k

If you don't have a big-memory machine (512GB+), you could try Minia.

Manual: http://minia.genouest.org/files/manual.pdf

ADD COMMENT
1
Entering edit mode
9.9 years ago
alecloic ▴ 40

hello,

thank you for your advice, it will help me in my choice.

For ALLPATHS-LG and Soap, I thought it needed shorter reads. but suddenly I'll maybe use Soap. I can't use ALLPATHS-LG because "ALLPATHS‐LG requires a minimum of 2 paired ‐ end libraries - one short and one long" and I do not have that.

Minia could indeed be a good solution, I had not heard about this software during my research. I have not yet seen testing on a large genome with IDBA-UD, but why not.

best regards

A. GUYOMARD
France, Lyon

ADD COMMENT
0
Entering edit mode
9.9 years ago
5heikki 11k

How about IDBA-UD?

ADD COMMENT
0
Entering edit mode

Not sure if it works for Gbp-sized genomes..

ADD REPLY
0
Entering edit mode
9.9 years ago
alecloic ▴ 40

If it can be useful to someone, it seems that there are other interesting software for this case: JR-Assemble, Contrail and maybe Gossamer.

ADD COMMENT

Login before adding your answer.

Traffic: 1529 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6