Question

Create A Consensus Of Very Short Sequences

0

Entering edit mode

11.1 years ago

Gabriel R. ★ 2.9k

Imagine I have a set of sequences in fastq format where the actual sequence look like this:

TAGGGTTGGGCCTGACAAGTCAT
TAGGGTTGGGCCTGACAAGTCAT
TAGGGTTGGGCCTGACAAGTCAT
TAGGGTTGGGCCTGACAAGTCAT
TAGGGTTGGGCCTGACAAGTCAT
TAGGGTTGGGCCTGACAAGTCAT
TAGGGTTGGGCCTGACAAGTCAG

Imagine these little tags all stem (theoretically speaking) from the same template. The last G in the last sequence is likely to be a sequencing error. Is there a way to go:

./magicalprogram in.fastq > out.fastq

where out.fastq would contain: TAGGGTTGGGCCTGACAAGTCAT as this is the consensus

Thanks !

consensus sequence sequencing alignment • 2.5k views

ADD COMMENT • link updated 11.1 years ago by Neilfws 49k • written 11.1 years ago by Gabriel R. ★ 2.9k

0

Entering edit mode

they all have the very same length ?

ADD REPLY • link 11.1 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

In theory, yes, in practice, no

ADD REPLY • link 11.1 years ago by Gabriel R. ★ 2.9k

score 2 · Answer 1 · 2013-12-03

2

Entering edit mode

11.1 years ago

Ido Tamir 5.2k

CD-Hit or Usearch are good programs for this

ADD COMMENT • link 11.1 years ago by Ido Tamir 5.2k

score 1 · Answer 2 · 2013-12-03

1

Entering edit mode

11.1 years ago

Biojl ★ 1.7k

Take a look at the biopython section entitled: 18.3.2 Calculating a quick consensus sequence

It should be very easy to use it in a simple script to achieve your objective

ADD COMMENT • link 11.1 years ago by Biojl ★ 1.7k

0

Entering edit mode

Thank you ! Does someone happen to have a ready made script ?

ADD REPLY • link 11.1 years ago by Gabriel R. ★ 2.9k

score 0 · Answer 3 · 2013-12-03

0

Entering edit mode

11.1 years ago

Neilfws 49k

If you're looking to correct sequencing errors, you might try the soon-to-be published Blue software package. It takes FASTQ or FASTA as input.

There are other tools available too; "sequence error correction software" are your search terms.

ADD COMMENT • link 11.1 years ago by Neilfws 49k