Question

covert STOCKHOLM alignment to FASTA alignment

3

Entering edit mode

10.2 years ago

venu 7.1k

Hello,

I've generated a multiple sequence alignment pf protein sequences with HMMER suite. The possible output formats with this are SELEX and STOCKHOLM. Is there anyway to convert these formats to FASTA multiple sequence alignment format? I've found this but it is not accepting large files.

stockholm alignment fasta • 14k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by venu 7.1k

Ram · Answer 1 · 2015-02-25

esl-* tools ship with HMMER, don't they?

esl-reformat -h
# esl-reformat :: convert between sequence file formats
# Easel h3.1b1 (May 2013)
# Copyright (C) 2013 Howard Hughes Medical Institute.
# Freely distributed under the Janelia Farm Software License.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Usage: esl-reformat [-options] <format> <seqfile>
  Output format choices: Unaligned      Aligned    
                         -----------    -------    
                         fasta          a2m        
                         hmmpgmd        afa        
                                        clustal    
                                        clustallike
                                        pfam       
                                        phylip     
                                        phylips    
                                        psiblast   
                                        selex      
                                        stockholm  
  where options are:
  -d             : convert to DNA alphabet (U->T)
  -h             : help; print brief info on version and usage
  -l             : convert to lower case
  -n             : remove DNA IUPAC codes; convert ambig chars to N
  -o <s>         : send output to file <f>, not stdout
  -r             : convert to RNA alphabet (T->U)
  -u             : convert to upper case
  -x             : convert non-IUPAC chars (e.g. X) in DNA to N
  --gapsym <s>   : convert all gaps to character <c>
  --informat <s> : input sequence file is in format <s>
  --mingap       : remove columns containing all gaps (seqfile=MSA)
  --keeprf       : with --mingap, keep all nongap #=GC RF columns
  --nogap        : remove columns containing any gaps (seqfile=MSA)
  --wussify      : convert old RNA structure markup lines to WUSS
  --dewuss       : convert WUSS RNA structure markup to old format
  --fullwuss     : convert simple WUSS notation to full (output) WUSS
  --ignore <s>   : ignore input seq characters listed in string <s>
  --acceptx <s>  : accept input seq chars in string <s> as X
  --rename <s>   : rename and number each sequence <s>.<n>
  --replace <s>  : <s> = <s1>:<s2> replace characters in <s1> with those in <s2>
  --small        : use minimal RAM, input must be pfam, output must be afa or pfam
  --id_map <s>   : if format is hmmpgmd, put the id map into file <s>

Ram · Answer 2 · 2015-02-25

2

Entering edit mode

10.2 years ago

GouthamAtla 12k

This script or this script should do that. Dependent on BioPerl. Try them and validate the output. If does not work try to modify the script.

ADD COMMENT • link updated 4.4 years ago by Ram 45k • written 10.2 years ago by GouthamAtla 12k

0

Entering edit mode

Please note that the gapped output (when using -g) for stockholm2fasta-pl is afa format, so it contains dots and dashes, and the default output is unaligned fasta format.

ADD REPLY • link updated 4.4 years ago by Ram 45k • written 5.3 years ago by Fatima ▴ 1000

score 1 · Answer 3 · 2015-02-25

1

Entering edit mode

10.2 years ago

a.polo88 ▴ 120

http://blog.mckuhn.de/2010/08/convert-stockholm-sequence-format-to.html?m=1

ADD COMMENT • link 10.2 years ago by a.polo88 ▴ 120

0

Entering edit mode

New link: https://biocs-blog.blogspot.com/2010/08/convert-stockholm-sequence-format-to.html

ADD REPLY • link 5.3 years ago by Fatima ▴ 1000