Need Help: How can I perform Emboss-sixpack on multi-sequences fasta file
1
0
Entering edit mode
5.5 years ago
shiv ▴ 10

Hello everyone,

I have ~10 fasta files and each file contain more than 50 sequences and I want to get the information about 6-frame translation and ORFs for each sequences, I found Emboss-sixpack to do this work for me but when I went through the manual I got to know that it takes only single sequence as input file. Can you please suggest me with other options of this (may be I've missed) and is there any alternate option to do this thing without splitting the files ..

Thanks in advance

Assembly gene • 2.0k views
ADD COMMENT
1
Entering edit mode
5.5 years ago
Michael 55k

This could be a workflow combining your favorite solution of:

  1. How To Split A Multiple Fasta or this aw(k)esome code by Pierre: A: Is there a way to split single .txt file with multiple fasta sequences into indi

  2. Bash Loop For Job Submission and here: A: Bash Loop For Job Submission (needed to be fully parameterized, otherwise sixpack asks for user input, and it doesn't seem to read or write to STDIN/STDOUT)

This should work without having to install any additional software on linux and mac

awk '/^>/ {if(x>0) close(outname); x++; outname=sprintf("_%d.fa",x); print > outname;next;} {if(x>0) print >> outname;}' *.fasta

for f in _*.fa
do
    sixpack -sequence $f -outfile $f.sixpack.out -outseq $f.sixpack.fa
done

If you want to have the output in a single file, use cat to combine them.

ADD COMMENT
0
Entering edit mode

I think the last line of this answer might be what you want to do. Here's an example to concatenate files:

cat *.fasta > bigfasta.fasta

Also, I'm pretty sure the EMBOSS suite is in a public Galaxy server out there for easy access.

ADD REPLY
0
Entering edit mode

Hi,

Thanks, Michael.. I am trying to implement your suggestion (this awk command) with Python, but I am getting syntax error every time, Can you please help me in this. Below is code :

import subprocess

cmd = "awk '/^>/ {if(x>0) close(outname); x++; outname=sprintf("_%d.faa",x); print > outname;next;} {if(x>0) print >> outname;}' {}".format(FASTA_FILE_PATH)

subprocess.call(cmd, shell=True)

  • FASTA_FILE_PATH : full path of input fasta file
ADD REPLY
0
Entering edit mode

Why call awk from python? Anyway you have to adapt the quotes. E.g. escape them or use alternative quotes. Like qq in Perl. Don’t know if python has similar functionality

Of course it has something similar... https://stackoverflow.com/questions/29559905/does-python-have-an-equivalent-of-perls-qq

but simply not as powerful 💪

ADD REPLY
0
Entering edit mode

Hi,

As I said that I have more than one fasta file so I just tried to write python script to get the path of input fasta file.. Now my problem has been solved.. Thank you so much for help !!

ADD REPLY

Login before adding your answer.

Traffic: 1669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6