Extract Cds Fastas From A Gff Annotation
4
0
Entering edit mode
9.7 years ago

As the title, I am trying to Extract Cds Fastas From A Gff Annotation ,please tell me where I was wrong .

Here is my code.

#!/usr/bin/perl-w

use strict;
use warnings;

my $fasta_file = "/Users/huli/Documents/IRGSP-1.0_genome.fasta";
my $gff_file = "/Users/huli/Documents/IRGSP-1.0_representative_2015-03-31/transcripts_output.gff";
my $outputfile = "/Users/huli/Documents/IRGSP-1.0_representative_2015-03-31/fasta_outputfile.fasta";

open (FASTA,"<", $fasta_file) or die "$!";
open (GFF,"<", $gff_file) or die "$!";
open (OUT,">", $outputfile) or die "$!";

my @array1 = $fasta_file; # useless assignment of scalar to array
my @array2 = $gff_file; # did you mean @array = <FASTA> ?
my %hash;
while(@array1 = <FASTA>){ # wrong combined use of iterator + while and # array context of <>
    chomp;
    if (/>/){
        my $_ = my $key; # mask $_ ?? now $_ is undef
        $key =~ s/>/ /g;
    }else{
    $hash{my $key} .=  $_;
    }
}

while (@array2 = <GFF>){
    chomp;
    @array2 = split("\t", $_);
    my $temp = substr($hash{$array2[0]}, $array2[3]-1, $array2[4]-$array2[3]);
    print OUT ">$array2[8]\n $temp\n";
}

close FASTA;
close GFF;
close OUT;
sequencing genome gene • 3.2k views
ADD COMMENT
1
Entering edit mode

I highlighted some obvious mistakes, you'll have gotten some warnings about redefined variables. Based on these mistakes, your codes does nothing most likely.

ADD REPLY
0
Entering edit mode

Thanks for your commend, I will amend my codes.

ADD REPLY
0
Entering edit mode

If you paste the error that are you getting we could help you easily :)

ADD REPLY
0
Entering edit mode

well, there are to many errors here, it just a little hard to paste.

ADD REPLY
0
Entering edit mode
my @array1 = $fasta_file;
my @array2 = $gff_file;

Do you really need this? I think defining as my @array1; my @array2; is enough there. Be cause you are using FH in while loop. or modify these two lines to

my @array1 = <FASTA>;
my @array2 = <GFF>;

and in while loop use simple while(@array1) and so.

ADD REPLY
0
Entering edit mode

Thank you a lot , I think I just get something in my brain, thank you so much.

ADD REPLY
0
Entering edit mode

Well, I amended my code, and here it is:

#!/usr/bin/perl-w

use strict;
use warnings;

my $fasta_file = "/Users/huli/Documents/IRGSP-1.0_genome.fasta";
my $gff_file = "/Users/huli/Documents/IRGSP-1.0_representative_2015-03-31/transcripts_output.gff";
my $outputfile = "/Users/huli/Documents/IRGSP-1.0_representative_2015-03-31/hehe_outputfile.txt";

open (FASTA,"<", $fasta_file) or die "$!";
open (GFF,"<", $gff_file) or die "$!";
open (OUT,">", $outputfile) or die "$!";

my @array1 = <FASTA>;
my @array2 = <GFF>;
my %hash;
while(<FASTA>){
    chomp;
    if (/>/){
        my $key = $_;
    }else{
    $hash{$key} .=  $_;
    }
}

while ( <GFF>){
    chomp;
    @array2 = split("\t", $_);
    $array2[8] =~ s/^parent=/>/g;
    my $temp = substr($hash{$array2[0]}, $array2[3]-1, $array2[4]-$array2[3]);
    print OUT ">$array2[8]\n $temp\n";
}

close FASTA;
close GFF;
close OUT;

and here are the errors:

Global symbol "$key" requires explicit package name at /Users/huli/Desktop/HEHE.pl line 22.
Execution of /Users/huli/Desktop/HEHE.pl aborted due to compilation errors.

I just learned perl for a short time, so when you guys give me answer, please give me some detailed answer. Thanks for your help. And could you tell me it is a right way to write a hash like me do?

ADD REPLY
1
Entering edit mode

Try to define $key variable outside the loop (my $key;), and then $key = $_.

And just a suggestion, for working with biological data, I would give a try to BioPerl, it has the most suitable modules for this kind of scripts.

ADD REPLY
0
Entering edit mode

Thanks for your answers,here are my new codes, and errors .

#!/usr/bin/perl-w

use strict;
use warnings;

my $fasta_file = "/Users/huli/Documents/IRGSP-1.0_genome.fasta";
my $gff_file = "/Users/huli/Documents/IRGSP-1.0_representative_2015-03-31/transcripts_output.gff";
my $outputfile = "/Users/huli/Documents/IRGSP-1.0_representative_2015-03-31/hehe_outputfile.txt";

open (FASTA,"<", $fasta_file) or die "$!";
open (GFF,"<", $gff_file) or die "$!";
open (OUT,">", $outputfile) or die "$!";

my @array2 ;
my %hash;
while(<FASTA>){
    chomp;
    my $key;
    if (/>/){
        $key = $_;
    }else{
        $hash{$key} .=  $_;
    }
}

while ( <GFF>){
    chomp;
    @array2 = split("\t", $_);
    $array2[8] =~ s/^parent=/>/g;
    my $temp = substr($hash{$array2[0]}, $array2[3]-1, $array2[4]-$array2[3]);
    print OUT ">$array2[8]\n $temp\n";
}

close FASTA;
close GFF;
close OUT;

This is a part of my errors, I know I need amend 22 lines, this line $hash{$key} .= $_; but I don't know how to amend it. I hope your guys can help me, thanks a lot.

Use of uninitialized value $key in hash element at /Users/huli/Desktop/HEHE.pl line 22, <FASTA> line 201256.
Use of uninitialized value $key in hash element at /Users/huli/Desktop/HEHE.pl line 22, <FASTA> line 201257.
Use of uninitialized value $key in hash element at /Users/huli/Desktop/HEHE.pl line 22, <FASTA> line 201258.
Use of uninitialized value $key in hash element at /Users/huli/Desktop/HEHE.pl line 22, <FASTA> line 201259.
Use of uninitialized value $key in hash element at /Users/huli/Desktop/HEHE.pl line 22, <FASTA> line 201260.
Use of uninitialized value $key in hash element at /Users/huli/Desktop/HEHE.pl line 22, <FASTA> line 201261.
Use of uninitialized value $key in hash element at /Users/huli/Desktop/HEHE.pl line 22, <FASTA> line 201262.
ADD REPLY
1
Entering edit mode
9.7 years ago
Juke34 9.0k

If you want experiment some code my answer is not at all useful.

But if you want simply extract the cds fasta from a gff annotation, the gffread utility from cufflinks (http://cole-trapnell-lab.github.io/cufflinks/file_formats/#the-gffread-utility) handles that perfectly well.

The command to use is the following:

gffread -x output_cds.fa -g your_genome.fa your_annotation.gff
ADD COMMENT
0
Entering edit mode

wish I knew about this a long time ago! great tool

ADD REPLY
1
Entering edit mode
9.7 years ago
Juke34 9.0k

You have to define $key outside the if statement because you try to use it in the else statement. In other terms, your $key defined in the if is only accessible in the if. In the else, $key is not defined, so you get an error.

Do this:

my $key;

while(<FASTA>){
    chomp;

    if (/>/){

      $key = $_;
    }else{
    $hash{$key} .=  $_;
    }
ADD COMMENT
1
Entering edit mode
9.7 years ago
Juke34 9.0k

As said several time before, You have just to write my $key; outside the while loop!

ADD COMMENT
0
Entering edit mode

Thanks for your answer, actually I moved my $key to the outside of the while loop, but there were still a lot of errors, since I have to use perl to solve this problem, maybe you can recommend some books to me . Anyway, thank you again.

ADD REPLY
0
Entering edit mode

Show us what are your errors then, we can maybe help you.

To learn perl seriously you can read the camel book published by O'Reilly.

http://en.wikipedia.org/wiki/Programming_Perl

ADD REPLY
0
Entering edit mode
9.7 years ago

Thanks for your guys, I finally got my right codes, and I will show them next, hope my codes can help someone like me .

#!/usr/bin/perl-w

use strict;
use warnings;

my $fasta_file = "IRGSP-1.0_genome.fasta";
my $gff_file = "transcripts.gff";
my $outputfile = "hehe_outputfile.txt";

open (FASTA,"<", $fasta_file) or die "$!";
open (GFF,"<", $gff_file) or die "$!";
open (OUT,">", $outputfile) or die "$!";

my $name;
my %hash_fasta;
while(<FASTA>){
    chomp;
    if (/>(chr[0-9]{1,2})/){
        $name = $1;
        $hash_fasta{$name} = '';
    }else{
        $hash_fasta{$name} .=  $_;
    }
}

my @array2 ;
my %hash_gff;
while ( <GFF>){
    chomp;
    @array2 = split;
    $array2[8] =~ s/^Parent=/>/g;
    my $strand = $array2[6];
        my $temp = substr($hash_fasta{$array2[0]}, $array2[3]-1, $array2[4]-$array2[3]);
        if($strand eq "-"){
            $temp =~ tr/AGCTagct/TCGAtcga/;
            $temp = reverse$temp;
            $_ = $temp;
        }elsif($strand eq "+"){
            $_ = $temp;
        }
    if(exists $hash_gff{$array2[8]}){
        $hash_gff{$array2[8]} .= $temp;
        }else{
            $hash_gff{$array2[8]} = $temp;
        }
}

foreach my $key(sort keys %hash_gff){
     print OUT "$key\n$hash_gff{$key}\n";
}


exit;
close FASTA;
close GFF;
close OUT;
ADD COMMENT

Login before adding your answer.

Traffic: 3662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6