Perl script for extracting different genes information from gff file
1
0
Entering edit mode
7.6 years ago

Assalam o alaikum everyone

I have to write a perl script which extract genes information from annotation files (gff). I have gff files of different genomes downloaded from NCBI and two txt files 1 of them contains full path of gff files and 2nd contains list of genes which information have to extract from annotation files.. Actually i have extracted this information through grep command but now i have to write a perl script for this I'm newin perl kindly guide me how to do so ????

Thanks in advance :)

$ cat annotation_files_path

~/home/usr/NCBI_Mammals_genomes/Nine_banded_armadillo/

~/home/usr/NCBI_Mammals_genomes/Noth_American_deer_mouse/

~/home/usr/NCBI_Mammals_genomes/Northern_mole_vole/

$ cat Genes_list

acmsd

smo

amy2b
perl script coordinates fetching • 3.0k views
ADD COMMENT
2
Entering edit mode

Since you say you 'have' to write a perl script, have you tried any code? No one should write something for you if this is homework. That's not the purpose of this site. However, if you post a snippet of your code with an error message that you're having trouble with, someone can give you helpful hints.

ADD REPLY
1
Entering edit mode

if you have to write perl code, then below perl code by Alex may work. If you are a bioinformatician, you can use perl module for intersection: http://search.cpan.org/~cjfields/BioPerl-Run/lib/Bio/Tools/Run/BEDTools.pm

ADD REPLY
2
Entering edit mode
7.6 years ago
system("grep -f genes_list.txt some_annotations.gff > answer.gff");

Or:

#
# read genes into array
#
my @genes = [];
my $genesFn = "genes_list.txt";
open my $genesFh, "<", $genesFn or die "could not open genes file handle!\n";
while (<$genesFh>) {
    chomp;
    push @genes, $_;
}
close $genesFh;

#
# build a regular expression pattern from genes
#
my $regex = sprintf('(%s)', join ('|', @genes));

#
# write any matches between annotation line and pattern to results file
#
my $annotationsFn = "some_annotations.gff";
my $resultsFn = "answer.gff";
open my $annotationsFh, "<", $annotationsFn or die "could not open annotations file handle!\n";
open my $resultsFh, ">", $resultsFn or die "could not open handle to results file!\n";
while (<$annotationsFh>) {
    chomp;
    if ($_ =~ /$regex/) {
        print $resultsFh "$_\n";
    }
}
close $resultsFh;
close $annotationsFh;
ADD COMMENT
0
Entering edit mode

Thank u so much ,, its so helpful previously i even don't know the direction ,,,

Actually in the output of this code whole annotation file comes after a small editing it gives genes information in the single file ,,

my $annotationsFn = "GCF_000298275.1_OryAfe1.0_genomic.gff"; my $resultsFn = "answer.gff"; open my $annotationsFh, "<", $annotationsFn or die "could not open annotations file handle!\n"; open my $resultsFh, ">", $resultsFn or die "could not open handle to results file!\n"; while (<$annotationsFh>) { chomp; if ($_ =~ /ACMSD/ || $_ =~ /CRYM/ || $_ =~ /ARID1B/ ) { print $resultsFh " $_\n";

I have two problems now first one is that all genes information come in a single file but i want different output file for different genes ,, for this i tried for loop but it did not work ,,

and second and major problem is that i have to work on about 150 annotation files and in annotation_file_path.txt i gave complete path of all annotation files but failed to execute it because script only work when a single annotation file use as shown above otherwise it print whole annotation _file_path.txt ,,, is there any way to iterate each path of annotation_file.txt one by one ??? kindly guide me ???

p.s i'm a beginner so happy to give me more info ....

ADD REPLY
1
Entering edit mode

Look up arrays and looping over elements in an array.

ADD REPLY

Login before adding your answer.

Traffic: 1982 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6