Find consensus sequence of several DNA sequences
1
0
Entering edit mode
7.0 years ago
Bella_p ▴ 70

Hi!

I have a list of around 200 different DNA sequences, each ~150 bp long, and I'd like to find a consensus sequence for all of them. I'm sure there is probably a function that does that which I'm not familiar with. Does anyone know which package/function to use to do that? I prefer in python, but R is also OK....

Thanks!

python R alignment consensus sequence • 13k views
ADD COMMENT
1
Entering edit mode

Hwave you tried a multiple sequence alignment?

Any of these tools should provide you with a consensus sequenc:

https://www.ebi.ac.uk/Tools/msa/

ADD REPLY
6
Entering edit mode
7.0 years ago
st.ph.n ★ 2.7k

You can use Biopython to create a consensus sequence.

#!/usr/bin/env python

import sys
from Bio import AlignIO
from Bio.Align import AlignInfo

alignment = AlignIO.read(sys.argv[1], 'fasta')
summary_align = AlignInfo.SummaryInfo(alignment)
summary_align.dumb_consensus(float(sys.argv[2]))

Save as consensus.py, run as python consensus.py input.fasta x, where x is the percentage of sequences to call a position in the consensus sequence; i.e. python consensus.py input.fasta 0.5 would mean that a residue or nucleotide would have to be represented in 50% of the sequences to call that position.

ADD COMMENT

Login before adding your answer.

Traffic: 2158 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6