I am looking for a command line tool to get a consensus sequence, with a proper example usage not from a paper where everything is theoretical or from a software suite. I have reviewed options but quality of answers and documentation is bad. For example there is UGENE but they don't provide a command line. Also the Wikipedia page does not list any software or source code. A previous post in Biostars where the question was about Consensus Sequence and then people started suggesting Sequence Logos? I don't want sequence logo :)
Some people links to a program called USEARCH, but there is no command line usage example and it seems is not open source or the code is not available (documentation looks confusing too). Same happens with a software called CD-HIT. The biopython considers two algorithms: One is "dumb" and another is based in "Position Specific Score Matrices". However I don't want to go through python, or perl, ruby, etc. Another one called Consensus Maker from the HIV sequence database is web-based and not open source.
Situation is worst after reading the UGENE manual. They describe several algorithms:
JalView (Default) - it is based on the JalView algorithm. Returns '+' if there are 2 characters with high frequency. Returns symbol in lower case if the symbol content in a row is lower than the specified threshold.
ClustalW - emulates the ClustalW program and file format behavior.
Levitsky - this algorithm is proposed by Victor Levitsky to calculate consensus of DNA alignments. At first, it collects global alignment frequencies for every symbol using extended (15 symbols) DNA alphabet. Then, for every column it selects the rarest symbol in the whole alignment with percentage in the column greater or equals to the threshold value.
Strict - the algorithm returns gap character ('-') if symbol frequency in a column is lower than the threshold specified.
So there are lot of links and explanations but no command line?
Wow! What an exhaustive research!!! Wouldn't it have been just faster to code it yourself?
I compute the consensus sequence from the Position-Weight Matrices (PWM). Biopython can do it (yes I know you don't want it -that's why it's a comment and not an answer).
ADD REPLY
• link
updated 3.0 years ago by
Ram
44k
•
written 10.4 years ago by
Hugues
▴
250
To extract consensus in UGENE with the command line interface you need to use UGENE Workflow Designer. There is an element that extracts consensus from each input alignment. You can create a workflow that reads alignment, extracts consensus and output consensus sequences to files. Since any workflow can be run from command line that's exactly what you need. Actually, you can find the pre-built workflow in the Samples bar of the Workflow Designer.
So it will be something like: ugenecl --task=extract_consensus --in=in.aln --out=out.fa
P.S. These functions will be released only in UGENE 1.14, that it why you did not find it in the documentation. But to get them right now, you can use the development snapshot from here
I wrote this perl script to take in a fasta alignment and output a full IUPAC consensus and also list 50 probes of X size which are weighted for A/C/T/G. It might help:
Wow! What an exhaustive research!!! Wouldn't it have been just faster to code it yourself?
I compute the consensus sequence from the Position-Weight Matrices (PWM). Biopython can do it (yes I know you don't want it -that's why it's a comment and not an answer).