Dear all, happy New Year!
I need to cut all fasta-seqs with their headers (reading consequtively many fasta-files) with the seq-lengths between minlen and maxlen. For one file at the bottom the script seems to work.
I've seen this example somewhere in biostar, thanks to the author! I've just modified it a little bit - it was just for a single length limit. but I have about one hundred fasta files with different names, a sequence length range (>50 and <100) and I am very poor in awk - I cannot make this shorter. Could you, please, help me! Many thanks in advance!
#!/usr/bin/perl
use strict;
use warnings;
#to cut a fasta-seqs from fasta-files with the seq-lengths between minlen and maxlen
# the script: FRAGM_both_values_fasta_trial.pl
my $minlen = $ARGV[0];
my $maxlen = $ARGV[1];
{
local $/=">";
while(<>) {
chomp;
next unless /\w/;
s/>$//gs;
my @chunk = split /\n/;
my $header = shift @chunk;
my $seqlen = length join "", @chunk;
print ">$_" if(($seqlen >= $minlen)&&($seqlen <= $maxlen));
}
local $/="\n";
}
#Then invoke the script like that:
# perl FRAGM_both_values_fasta_trial.pl 5 10 example_fasta.txt > example_fasta_5_10.txt
# I make the range narrower - from 5 to 10 for training.
example_fasta.txt
>one
ETGT
>TWO
DGJLKTFJG
>THREE
DHSFRUTYIPUTE
>FOUR
DGFJTI
>FIVE
ADKLPFGGHH
Thank you very much, I will try.
Best wishes,
Natasha