covert STOCKHOLM alignment to FASTA alignment
3
3
Entering edit mode
9.8 years ago
venu 7.1k

Hello,

I've generated a multiple sequence alignment pf protein sequences with HMMER suite. The possible output formats with this are SELEX and STOCKHOLM. Is there anyway to convert these formats to FASTA multiple sequence alignment format? I've found this but it is not accepting large files.

stockholm alignment fasta • 14k views
ADD COMMENT
5
Entering edit mode
9.8 years ago
5heikki 11k

esl-* tools ship with HMMER, don't they?

esl-reformat -h
# esl-reformat :: convert between sequence file formats
# Easel h3.1b1 (May 2013)
# Copyright (C) 2013 Howard Hughes Medical Institute.
# Freely distributed under the Janelia Farm Software License.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Usage: esl-reformat [-options] <format> <seqfile>
  Output format choices: Unaligned      Aligned    
                         -----------    -------    
                         fasta          a2m        
                         hmmpgmd        afa        
                                        clustal    
                                        clustallike
                                        pfam       
                                        phylip     
                                        phylips    
                                        psiblast   
                                        selex      
                                        stockholm  
  where options are:
  -d             : convert to DNA alphabet (U->T)
  -h             : help; print brief info on version and usage
  -l             : convert to lower case
  -n             : remove DNA IUPAC codes; convert ambig chars to N
  -o <s>         : send output to file <f>, not stdout
  -r             : convert to RNA alphabet (T->U)
  -u             : convert to upper case
  -x             : convert non-IUPAC chars (e.g. X) in DNA to N
  --gapsym <s>   : convert all gaps to character <c>
  --informat <s> : input sequence file is in format <s>
  --mingap       : remove columns containing all gaps (seqfile=MSA)
  --keeprf       : with --mingap, keep all nongap #=GC RF columns
  --nogap        : remove columns containing any gaps (seqfile=MSA)
  --wussify      : convert old RNA structure markup lines to WUSS
  --dewuss       : convert WUSS RNA structure markup to old format
  --fullwuss     : convert simple WUSS notation to full (output) WUSS
  --ignore <s>   : ignore input seq characters listed in string <s>
  --acceptx <s>  : accept input seq chars in string <s> as X
  --rename <s>   : rename and number each sequence <s>.<n>
  --replace <s>  : <s> = <s1>:<s2> replace characters in <s1> with those in <s2>
  --small        : use minimal RAM, input must be pfam, output must be afa or pfam
  --id_map <s>   : if format is hmmpgmd, put the id map into file <s>
ADD COMMENT
0
Entering edit mode

Easel can be downloaded from here.

The instruction on how to build the source code is in the link.

The esl-reformat is in the miniapps folder.

./esl-reformat **a2m** seqfile > alignfile

If one is using hmmalign to create the alignments, it can output a2m format:

hmmalign -**-outformat A2M** -o alignment.a2m
ADD REPLY
2
Entering edit mode
9.8 years ago

This script or this script should do that. Dependent on BioPerl. Try them and validate the output. If does not work try to modify the script.

ADD COMMENT
0
Entering edit mode

Please note that the gapped output (when using -g) for stockholm2fasta-pl is afa format, so it contains dots and dashes, and the default output is unaligned fasta format.

ADD REPLY

Login before adding your answer.

Traffic: 2027 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6