Question

Intron extraction from ensembl

1

Entering edit mode

10.5 years ago

ravihansa82 ▴ 130

Friends I got a problem when downloading first intron sequence from ensembl. I used this code perfectly few months ago but now it give me a warning message such as, Can't call method "canonical_transcript" on an undefined value at ./script.pl line 42, <FILE> line 100. without having any new line at the end of the input sequence file why this happen. can some one help me.

Here is the code

#!/usr/bin/perl

use strict;
use warnings;
use Bio::EnsEMBL::Registry;
use Bio::SeqIO;
use Getopt::Long;

my $registry = 'Bio::EnsEMBL::Registry';

## Load the databases into the registry
$registry->load_registry_from_db(
  -host => 'ensembldb.ensembl.org',
  -user => 'anonymous'
);

## Get the gene adaptor for human
my $gene_adaptor = $registry->get_adaptor( 'Human', 'Core', 'Gene' );
#my $gene_adaptor = $registry->get_adaptor( 'Mouse', 'Core', 'Gene' );

# declearing global variable
#my @arra = @_;
#my $line = $_;
my $line;
#my $l=$_;
   #opening the output file and write the sequences for abc file
   open (FILE_OUTPUT, ">my_sequence") or die "can't write";

   #opening the input file which is having ensemble gene IDS
   open (FILE,"seq") or die "Error";

    my @array = <FILE>;#assigning input file content to @array
        #run foreach loop and real each line by line and each line assigning to $line var
    foreach $line(@array){
          # this removes new line charactor
          chomp $line;
        # Fetch my gene of interest usning ensemble ID
        my $gene = $gene_adaptor->fetch_by_stable_id( $line );

        ## Get all transcripts for the gene
        ## Get canonical transcript for the gene which is the longest transcript
        my $transcript = $gene->canonical_transcript;

        ## For the canonical transcript get all introns and print the sequence of the first intron of each
          my $introns = $transcript->get_all_Introns;
          #my $first_intron = @$introns[0];
        if (@$introns){
        my $first_intron = @$introns[0];
        # print the gene stable ID, transcript stable ID,gene external name ,
        #seq length and firstintron sequence to the output fiel
          FILE_OUTPUT ->print
          ("\>", $gene->stable_id,";",$transcript->stable_id,";",$gene->external_name,";","sequence length",";",
        $first_intron->length,"\n",$first_intron->seq,"\n");
}
}
#close input file
close (FILE);
#close output file
close (FILE_OUTPUT);

sample ids

ENSG00000167972
ENSG00000023839
ENSG00000091262
ENSG00000164163
ENSG00000140526
ENSG00000138443
ENSG00000138107
ENSG00000135503
ENSG00000121989
ENSG00000139567
ENSG00000119640
ENSG00000151694
ENSG00000188778
ENSG00000035687
ENSG00000038002
ENSG00000018510
ENSG00000151320
ENSG00000059573
ENSG00000180318
ENSG00000133805
ENSG00000166313
ENSG00000163697
ENSG00000134982
ENSG00000062725
ENSG00000240583
ENSG00000100852
ENSG00000032219
ENSG00000196843
ENSG00000133794
ENSG00000157399
ENSG00000099889
ENSG00000198356
ENSG00000123268
ENSG00000155097
ENSG00000066427
ENSG00000167601
ENSG00000030110
ENSG00000114200
ENSG00000083123
ENSG00000110987
ENSG00000105829
ENSG00000242252
ENSG00000182492
ENSG00000100290
ENSG00000168487
ENSG00000112175
ENSG00000261236
ENSG00000172331
ENSG00000137274
ENSG00000139618
ENSG00000156983
ENSG00000109743
ENSG00000174808
ENSG00000112763
ENSG00000026950
ENSG00000186470
ENSG00000197223
ENSG00000171860
ENSG00000167434
ENSG00000164326
ENSG00000164305
ENSG00000137757
ENSG00000108091
ENSG00000181374
ENSG00000108691
ENSG00000108688
ENSG00000108700
ENSG00000112237
ENSG00000121797
ENSG00000177455
ENSG00000004468
ENSG00000102245
ENSG00000196776
ENSG00000116815
ENSG00000153563
ENSG00000004897
ENSG00000079112
ENSG00000170558
ENSG00000163624
ENSG00000153879
ENSG00000115163
ENSG00000153774
ENSG00000167670
ENSG00000085872
ENSG00000089199
ENSG00000133063
ENSG00000274542
ENSG00000175344
ENSG00000179583
ENSG00000109572
ENSG00000011021
ENSG00000184113
ENSG00000132514
ENSG00000172409
ENSG00000122705
ENSG00000118432
ENSG00000135775
ENSG00000121058
ENSG00000134871
ENSG00000169031

sequence genome gene ensembl • 2.6k views

ADD COMMENT • link updated 3.2 years ago by Ram 45k • written 10.5 years ago by ravihansa82 ▴ 130

Ram · Answer 1 · 2014-11-18

4

Entering edit mode

10.5 years ago

Jean-Karim Heriche 27k

The error message tells you what's wrong. Your gene object is undefined which means that the gene ID you're using doesn't exist in the Ensembl database version you're using. If you want your script to output the same thing as a few months ago then make sure you use the same version of the database you had then or if you want to use a more recent version, check that your gene IDs are still valid.

ADD COMMENT • link updated 3.2 years ago by Ram 45k • written 10.5 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

This is correct. You could put in a loop like:

if ($gene) {
 my $transcript = $gene->canonical_transcript;
}
else {
 print $gene->stable_id, "not found.\n"
}

Of you could use the ArchiveStableIDAdaptor to find the genes using out-of-date IDs.

ADD REPLY • link updated 3.2 years ago by Ram 45k • written 10.5 years ago by Emily 24k

0

Entering edit mode

Thanks Emily, I got the point you mentioned...

ADD REPLY • link 10.5 years ago by ravihansa82 ▴ 130