Dear community members,
We present SEDA, an open source application for processing FASTA files containing DNA and protein sequences. The source code is available at GitHub and a complete user manual is available here.
Among other operations, SEDA allow to filter sequences based on different criteria (including text patterns), translate nucleic acid sequences into amino acid sequences, execute BLAST analyses, remove duplicated sequences and isoforms, and sort, merge, split or reformat FASTA files. It has been succesfully used to support the workflows provided in the following publication: Bioinformatics Protocols for Quickly Obtaining Large-Scale Data Sets for Phylogenetic Inferences.
The operations are grouped in six categories: Alignment-related, BLAST, Filtering, Gene annotation, General and Reformatting. Below is the complete list of operations of SEDA 1.0 in each category:
Alignment-related
- Clustal Omega Alignment
- Concatenate sequences
- Consensus sequence
- Undo alignment
BLAST
- Blast
- Blast: two-way ortholog identification
Filtering
- Base presence filtering
- Filtering
- Pattern filtering
- Remove isoforms
- Remove redundant sequences
Gene Annotation
- Augustus (SAPP)
- getorf (EMBOSS)
- ProSplign/ProCompart Pipeline
- Splign/Compart Pipeline
General
- Compare
- Grow sequences
- Merge
- Regular expression split
- Split
- Translate
Reformatting
- Disambiguate sequence names
- NCBI rename
- Reallocate reference sequences
- Reformat file
- Rename header
- Sort
In case you encounter any bug or you want to ask for new operations or features, please feel free to open an issue at the GitHub repository.