Question

Predict AA sequence from assembled transcripts

0

Entering edit mode

9.8 years ago

Adrian Pelin ★ 2.6k

Hello,

Blastx is very slow these days, I was wondering if there are any tools available out there that take a fasta file with assembled transcripts and find the biggest open reading frame and then convert it to peptide sequence (amino acid sequence).

Basically I just want the coding regions so that I can use blastp, which is much faster. I tried google but I must be searching for the wrong thing.

Adrian

RNA-Seq Assembly nt2aa • 2.5k views

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.8 years ago by Adrian Pelin ★ 2.6k

score 6 · Answer 1 · 2015-02-16

Just to add to the above comment you can use my shell wrapper for EMBOSS's getorf

http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/getorf.sh

It has the following contents:

#!/bin/bash
less <&0| \
    perl -pe '/^>/?s/^>/\n>/:s/\s*$// if$.>1' | \
    perl -nse 'push @a, $_; @a = @a[@a-2..$#a];
    if ($. % 2 == 0){
        chomp $a[0];
        chomp $a[1];
        $r=qx/getorf -sequence=asis:$a[1] $o -stdout -auto 2>\/dev\/null/;
        $r =~ s/>asis/$a[0]/g;
        print $r}' -- -o=$2 | \
        perl -pe '/^>/?s/^>/\n>/:s/\s*$// if$.>1' | \
        perl -nse 'push @a, $_; @a = @a[@a-2..$#a];
        if ($. % 2 == 0){
            chomp $a[0];
            $a[0]=~/>(.*?) \[(\d+) - (\d+)\]\s*(.*)/g;
            $s=(($4=~y===c)=="0")?"+":"-";
            if($f eq "f"){
                print ">".$1."_".$2."_".$3."_".$s."\n".$a[1]}
            elsif($f eq "t"){
                print $1."\t".$2."\t".$3."\t".$4."\t".$s."\t".$a[1]}}' -- -f=$1

Here is how you can use it:

cat contigs.fasta | ./getorf.sh "f" "-minsize=30 -maxsize=10000"

Check the usage details at:

http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/emboss.html#getorf.sh

Also check out other shell wrappers for EMBOSS utilities here:

http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/emboss.html

Best Wishes,

Umer

score 0 · Answer 2 · 2015-02-16

0

Entering edit mode

9.8 years ago

Neilfws 49k

One of the EMBOSS tools such as transeq or getorf will do the translation for you, but you'll need to write or find a bit of code to extract the longest ORF.

ADD COMMENT • link 9.8 years ago by Neilfws 49k

Ram · Answer 3 · 2015-03-10

0

Entering edit mode

9.7 years ago

seta ★ 1.9k

You can try Transcriptdecoder, it gets your assembly fasta file and gives you longest ORF.

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by seta ★ 1.9k