I am doing Sanger sequencing of a construct ~2Kb using 4 primer pairs. I get back 4 .ab1 files, each with generally around 1Kb of high quality sequence and given the relatively small size of the construct these overlap significantly.
The goal is to assemble these 4 sequences into a single contig, in .fastq
format (therefore retaining the per-base quality scores), and then downstream I will align this back to the reference construct using bwa mem
.
I am trying to automate this procedure for hundreds of sequenced constructs. Previously this has been done manually in Geneious
, using Geneious assemble
(de novo assembly). The problem is, is that it is not possible to run Geneious assemble from the command line, and other tools I have used (cap3
, tracy
, tadpole
) either fail to generate a full length contig (whereas Geneious
succeeds), and / or do not output per-base quality scores (.fastq
)
I would have thought that it would be a piece of cake to find an open-source tool that can match Geneious
assemble, but this is not the case!
Can someone recommend a tool that I could try, or how I can optimise a tool to equal Geneious assemble
?
Any suggestions appreciated!
Not answering your question but thinking aloud. Unless your input data strictly conforms to a pattern it may be difficult to find a tool that does something like this without manual intervention perfectly. With sanger sequences, ends of the reads are going to be variable as the quality degrades so this is not a simple problem. Since you have very specific requirements (e.g. need to create fastq files) it may be a tall ask to find a command line tool.
tracy
would have been my recommendation for a recent tool but you seem to have tried it already.re: the read ends, I would have thought that such tools would be able to generate a consensus based on highest base quality / consensus between multiple sequences. I will continue playing around with
tracy
- thanks for the thoughts / info!