Dear All, I am a true starter of perl. I have finished reading several chapters of LEARNING PERL. Now I tried to write script. SEQ file contains multiple gene sequences, while LIST file is a list of gene names. Below is an example.
SEQ file:
>gene 1
gctagtcagtacgtacgtac
>gene 2
cgagagagaggccgcgagcatcagtacgtgtac
>gene 3
ccgcggggcggccgcggcccccgcggcgcgcgatta
LIST file:
>gene2
>gene3
My script is as follows:
#!/usr/bin/perl
use strict;
my ($seq, $list) =@ARGV;
open SEQ, "$seq";
open LIST, "$list";
my @seq= <SEQ>;
my @seq = split;
my @list= <LIST>;
my @list = split;
my $i=0;
my $j=0;
while ($i<$#seq){
while ($j < $#list){
if ($seq[$i] eq $list[$j]){
print "$seq[$i] is found\n";
}
$j++;
}
$i +=2;
}
close SEQ;
close LIST;
Could anyone working on perl make some corrections, suggestions and comments. I appreciate your help, which I believe drive me forward in the bioinformatics field. THANKS A LOT!
if you are working on a unix machine and your 'list' file isn't too long you should probably go with:
if your list-file is really large have a look at Kent tools and there the progamme faSomeRecords.
Cheers
THANKS for handy method!
no worries... ;)
I think you should mentioned what do you want your script to do.
Sorry for unclear description. I aim to extract sequences from SEQ file according to LIST file. In my script the line print "$seq[$i] is found\n" should be print "$seq[$i] ($seq[$i+1]) is found\n". Anyway, my script is wrong somewhere, as when running perl test.pl seq.fa list.txt, I didn't get any responding. THANKS for your help!
First ask yourself, what does this do :
What do you hope to achieve ? Tip, use the line "
use Data::Dumper;
" and print a variable usingprint Dumper($variable);
to start debugging. I will not give you the answer since I feel this is a learning opportunity for you.