I don't know if I'm using the proper terminology, but I have ab1 (sanger) sequencing chromatograms and I was wondering if there was any software out there that you were aware of to take two overlapped reads (e.g. overlaid indel-type mutations) and give two+ sequences as (fasta) output.
For example, if I have a read that is unambiguously ACTGGCGA but then I have 60% population where I have an A and 40% where that A is deleted, followed by GCGTGA, phred will likely give me ACTGGCGAAGCGTGA, but is there a way for a basecaller to give me ACTGGCGAGCGTGA as a secondary call? Or, if I have 50/50 and I know that one version is ACTGGCGAAGCGTGA, supplying that as a comparison file for subtraction, leaving me with a residual signal of either the full ACTGGCGAGCGTGA or of GCGTGAx?
Obviously I would have to be able to set a threshhold where I consider the result "noise" - either a fixed value of signal strength or e.g. 5% of the main peak strength in order to filter out illegitimate base calls of non-chimeric sequence.
As a separate, but related, issue, is there a way to get phred (or any other program) to "filter" noise spikes in chromatograms? For example, sometimes I see spikes in pyrimidine signal strength that is way out of proportion to legitimate regions of call (the peak height using an ab1 viewer like bioedit or consed - I'm not sure if linear or log scale - is well over double of any nearby base or even any other place in the file. These typically have "width" of about 5 base calls)
Thanks for adding your answer here. If you have more details, or want to share your experience after running polyphred, you could add a comment to your answer :-)
this didn't quite give me what I needed. I ended up taking the algorithm for "Multiple SeqDoC" (http://research.imb.uq.edu.au/seqdoc/) and modifying it to automatically call secondary peaks (my calls are pretty naive: based on peak height above a threshhold. Sometimes it misses a base that was different than the reference) and output the results both in-image and as a secondary text line. After that, you can create a fake with mktrace or just duplicate the original trace and edit it for the secondary call.
I'm not quite sure where/if to upload my modifications. It is a large improvement (IMHO) over the referenced SeqDoC algorithm, but it needs quite a bit of development to truely be useful to many.