Predict AA sequence from assembled transcripts
3
0
Entering edit mode
9.8 years ago
Adrian Pelin ★ 2.6k

Hello,

Blastx is very slow these days, I was wondering if there are any tools available out there that take a fasta file with assembled transcripts and find the biggest open reading frame and then convert it to peptide sequence (amino acid sequence).

Basically I just want the coding regions so that I can use blastp, which is much faster. I tried google but I must be searching for the wrong thing.

Adrian

RNA-Seq Assembly nt2aa • 2.5k views
ADD COMMENT
6
Entering edit mode
9.8 years ago

Just to add to the above comment you can use my shell wrapper for EMBOSS's getorf

http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/getorf.sh

It has the following contents:

#!/bin/bash
less <&0| \
    perl -pe '/^>/?s/^>/\n>/:s/\s*$// if$.>1' | \
    perl -nse 'push @a, $_; @a = @a[@a-2..$#a];
    if ($. % 2 == 0){
        chomp $a[0];
        chomp $a[1];
        $r=qx/getorf -sequence=asis:$a[1] $o -stdout -auto 2>\/dev\/null/;
        $r =~ s/>asis/$a[0]/g;
        print $r}' -- -o=$2 | \
        perl -pe '/^>/?s/^>/\n>/:s/\s*$// if$.>1' | \
        perl -nse 'push @a, $_; @a = @a[@a-2..$#a];
        if ($. % 2 == 0){
            chomp $a[0];
            $a[0]=~/>(.*?) \[(\d+) - (\d+)\]\s*(.*)/g;
            $s=(($4=~y===c)=="0")?"+":"-";
            if($f eq "f"){
                print ">".$1."_".$2."_".$3."_".$s."\n".$a[1]}
            elsif($f eq "t"){
                print $1."\t".$2."\t".$3."\t".$4."\t".$s."\t".$a[1]}}' -- -f=$1

Here is how you can use it:

cat contigs.fasta | ./getorf.sh "f" "-minsize=30 -maxsize=10000" 

Check the usage details at:

http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/emboss.html#getorf.sh

Also check out other shell wrappers for EMBOSS utilities here:

http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/emboss.html

Best Wishes,

Umer

ADD COMMENT
0
Entering edit mode

This is a really cool script! Thanks!

ADD REPLY
0
Entering edit mode
9.8 years ago
Neilfws 49k

One of the EMBOSS tools such as transeq or getorf will do the translation for you, but you'll need to write or find a bit of code to extract the longest ORF.

ADD COMMENT
0
Entering edit mode
9.7 years ago
seta ★ 1.9k

You can try Transcriptdecoder, it gets your assembly fasta file and gives you longest ORF.

ADD COMMENT

Login before adding your answer.

Traffic: 2274 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6