Question

Create Consensus Sequences For Sequence Pairs Within A Multiple Alignment?

7

Entering edit mode

15.0 years ago

Dave Lunt ★ 2.0k

Problem: I have a multiple alignment of sequence reads. Each individual is represented by two sequences, forward and reverse (e.g. individual1_For and individual1_Rev) that don't overlap at the ends. I want my alignment to have one sequence per individual that is the consensus of the forward and reverse sequences. NB I don't want to create a consensus across all individuals. I want to recognise sequences that share a name e.g. "individual1" and replace them with a full-length consensus of those two sequences.

Does anybody know of existing script or programs to do this? If it doesn't already exist I shall probably start putting something together with BioPerl, any suggestions of best approach?

consensus phylogenetics fasta multiple-alignment • 16k views

ADD COMMENT • link updated 17 months ago by Ram 45k • written 15.0 years ago by Dave Lunt ★ 2.0k

1

Entering edit mode

what is the main reason to do this? I mean you are going to use the output for? I ask this because you can use a logo to display the consensus.

ADD REPLY • link 15.0 years ago by Paulo Nuin ★ 3.7k

0

Entering edit mode

what does it means to calculate the consensus between the forward and reversed sequence? Could you make an example of your input and desired output?

ADD REPLY • link 15.0 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

forward: ACTGGACTAGACTACA----
reverse: -----ACTAGTCTACAGCTA
consens: ACTGGACTAGNCTACAGCTA

ADD REPLY • link updated 5.7 years ago by Ram 45k • written 15.0 years ago by Eric Normandeau 11k

0

Entering edit mode

forward: ACTGGACTAGACTACA----
reverse: -----ACTAGTCTACAGCTA
consens: ACTGGACTAGNCTACAGCTA

ADD REPLY • link updated 5.7 years ago by Ram 45k • written 15.0 years ago by Eric Normandeau 11k

Ram · Answer 1 · 2010-04-13

I have had to do something very similar recently.

The approach to solving your problem requires 3 steps. Here is my take:

Use Python with Biopython to pass through your sequence couples, one individual at a time. Reverse complement the second sequence. (use any other preferred approach)
Use the "dialign2" program repeatedly for each individual to create alignments. In this way, your 2 sequences will have the same length (with "-" characters to complete the missing parts).
Use Python to pass through each alignment (2 sequences each time) and create a consensus. This should be easy.

Here is an attempt for the third part:

def create_consensus(seq1, seq2):
    """Sequences must be strings, have the same length, and be aligned"""
    out_seq = ""
    for i, nucleotide in enumerate(seq1):
        couple = [nucleotide, seq2[i]]
        if couple[0] == "-":
            out_seq += couple[1]
        elif couple[1] == "-":
            out_seq += couple[0]
        elif couple[0] == couple[1]:
            out_seq += couple[0]
        elif not couple[0] == couple[1]:
            out_seq += "N"
    return out_seq

seq1 = "-----CAGATACGCGCACATAGACATAG"
seq2 = "ACTGACAGATACAGACACATA-------"

print seq1
print seq2
print create_consensus(seq1, seq2)

The printed result is:

-----CAGATACGCGCACATAGACATAG
ACTGACAGATACAGACACATA-------
ACTGACAGATACNNNCACATAGACATAG

This is admittedly quickly done and there may be both 1) mistakes or special cases 2) a better way to do this, maybe with Biopython.

Hope it helps!

Cheers

Ram · Answer 2 · 2010-05-03

Following up on Eric's useful answer, there is functionality to do this in Biopython. The Tutorial provides details on using it: http://biopython.org/DIST/docs/tutorial/Tutorial.html

Chapter 6 deals with reading in alignment objects
Chapter 14.3 provides useful functionality once you have an alignment object.

Specifically the function "dumb_consensus" calculates a consensus along the lines of what you are doing with create_consensus, with a bit of extra functionality like defining when you call something ambiguous. Source code for that is here:

http://github.com/biopython/biopython/blob/master/Bio/Align/AlignInfo.py

Ram · Answer 3 · 2010-06-22

2

Entering edit mode

14.8 years ago

Steve Moss 2.3k

I worked on a solution to this that is available here - https://raw.github.com/gawbul/bioinformatics-scripts/master/FRconsensus.py

Cheers,
Steve

ADD COMMENT • link updated 17 months ago by Ram 45k • written 14.8 years ago by Steve Moss 2.3k

0

Entering edit mode

hi, Steve Moss, I cant not connect to that link, could you please send me that script, thank you!

ADD REPLY • link 12.4 years ago by litiancheng.gansu ▴ 10

0

Entering edit mode

Hi! My apologies... I moved it over to GitHub... I've edited the post accordingly, but please now go here https://raw.github.com/gawbul/bioinformatics-scripts/master/FRconsensus.py

ADD REPLY • link 12.4 years ago by Steve Moss 2.3k

Ram · Answer 4 · 2011-06-14

1

Entering edit mode

13.9 years ago

2184687-1231-83- ★ 5.1k

Pagan has a consensus option that does what you need: http://www.ebi.ac.uk/~ari/pagan/

ADD COMMENT • link updated 17 months ago by Ram 45k • written 13.9 years ago by 2184687-1231-83- ★ 5.1k

0

Entering edit mode

Thanks, looks interesting, I'll have to check that out

ADD REPLY • link 13.9 years ago by Dave Lunt ★ 2.0k