Create A Consensus Of Very Short Sequences
3
0
Entering edit mode
11.1 years ago
Gabriel R. ★ 2.9k

Imagine I have a set of sequences in fastq format where the actual sequence look like this:

TAGGGTTGGGCCTGACAAGTCAT
TAGGGTTGGGCCTGACAAGTCAT
TAGGGTTGGGCCTGACAAGTCAT
TAGGGTTGGGCCTGACAAGTCAT
TAGGGTTGGGCCTGACAAGTCAT
TAGGGTTGGGCCTGACAAGTCAT
TAGGGTTGGGCCTGACAAGTCAG

Imagine these little tags all stem (theoretically speaking) from the same template. The last G in the last sequence is likely to be a sequencing error. Is there a way to go:

./magicalprogram in.fastq > out.fastq

where out.fastq would contain: TAGGGTTGGGCCTGACAAGTCAT as this is the consensus

Thanks !

consensus sequence sequencing alignment • 2.5k views
ADD COMMENT
0
Entering edit mode

they all have the very same length ?

ADD REPLY
0
Entering edit mode

In theory, yes, in practice, no

ADD REPLY
2
Entering edit mode
11.1 years ago
Ido Tamir 5.2k

CD-Hit or Usearch are good programs for this

ADD COMMENT
1
Entering edit mode
11.1 years ago
Biojl ★ 1.7k

Take a look at the biopython section entitled: 18.3.2 Calculating a quick consensus sequence

It should be very easy to use it in a simple script to achieve your objective

ADD COMMENT
0
Entering edit mode

Thank you ! Does someone happen to have a ready made script ?

ADD REPLY
0
Entering edit mode
11.1 years ago
Neilfws 49k

If you're looking to correct sequencing errors, you might try the soon-to-be published Blue software package. It takes FASTQ or FASTA as input.

There are other tools available too; "sequence error correction software" are your search terms.

ADD COMMENT

Login before adding your answer.

Traffic: 2372 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6