Help me choose an assembler (for something quite specific)
2
0
Entering edit mode
7.1 years ago
maxwhjohn1988 ▴ 130

I have a de novo assembly, produced with w2rap-contigger (a forked version of Discovar DeNovo, made at the Earlham Institute) from a single PE library. The assembly is not great (N50 ~ 55 kb) but it's not totally useless for me.

I am fishing for contigs of interest, and I am getting some good results. However, I don't know whether the contigs of interest are from the same chromosome(s) or whether they are scattered about the genome.

I am in the process of applying for some money to do some 10x Genomics sequencing, but while I wait I want to make use of some old MiSeq PE data which has been sitting around gathering dust. The MiSeq reads were never good enough to produce a decent assembly on their own, but I thought it is conceivable that they could improve my existing assembly.

I want to provide the contigs from the existing assembly, plus the MiSeq PE reads, as input to an assembler and see if I can improve anything. Discovar DeNovo (and w2rap) are designed for a single library, so I doubt it would be sensible to use it again for this. I was considering SPAdes, as it can do hybrid assembly, but that would require specifying the existing contigs as PacBio or Nanopore reads - presumably there would be some kind of error-correction applied as a result of doing that, which sounds undesirable.

Has anyone got any clever ideas, or solid reasons why this would be a waste of my time?

Assembly next-gen genome • 1.8k views
ADD COMMENT
1
Entering edit mode
7.1 years ago
h.mon 35k

I was considering SPAdes, as it can do hybrid assembly, but that would require specifying the existing contigs as PacBio or Nanopore reads - presumably there would be some kind of error-correction applied as a result of doing that, which sounds undesirable.

You are incorrect, SPAdes version 3.11.1 (and some earlier version as well, but I don't know since when) accepts contigs from other assemblers as input, see the parameters --trusted-contigs and --untrusted-contigs.

ADD COMMENT
0
Entering edit mode

Thanks for the correction! You are quite correct, I had completely forgotten about that option. Much appreciated, I might give this a try.

ADD REPLY
1
Entering edit mode
7.1 years ago

I would suggest you combine the PE miseq reads using flash or similar.

Then put all contigs into Soapdenovo2, it 's quite easy, fast and flexible.

Having said that, I don't think you will improve your asm too much. To check which chromosome each contig is from perhaps mapping to a related well sequenced related species might help you out ?

ADD COMMENT
0
Entering edit mode

Thanks - yes, I'm also dubious about whether I will be able to make any improvements.

I am currently trying to get outputs from nucmer in an intelligible graphical format, after doing exactly what you suggest :)

ADD REPLY
0
Entering edit mode

I find the dotplots program in Ugene to be excellent for comparing lots of different contigs rapidly. It is also easy to adjust parameters in.

ADD REPLY

Login before adding your answer.

Traffic: 2065 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6