Hello everyone,
I am trying to extract the lines which starts from ">>" and ends with "Complete" from my Input file.
INPUT FILE:
Read Sequence:ENSG00000110092|ENST00000227507 (3192 nt)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Performing Scan: hsa-miR-193b-5p
vs ENSG00000110092|ENST00000227507
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Score for this Scan:
No Hits Found above Threshold
Complete
Read Sequence:ENSG00000169429|ENST00000307407 (1252 nt)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Performing Scan: hsa-miR-491-5p
vs ENSG00000169429|ENST00000307407
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Forward: Score: 125.000000 Q:3 to 23 R:1060 to 1083 Align Len (23) (78.26%) (86.96%)
Scores for this hit:
**>hsa-miR-491-5p**
ENSG00000169429|ENST00000307407 125.00 -19.70 0.00 3 23 1060 1083 23 78.26% 86.96%
Score for this Scan:
Seq1,Seq2,Tot Score,Tot Energy,Max Score,Max Energy,Strand,Len1,Len2,Positions
**>>hsa-miR-491-5p**
ENSG00000169429|ENST00000307407 125.00 -19.70 125.00 -19.70 9000 22 1252 1059
Complete
Read Sequence:ENSG00000109320|ENST00000226574 (708 nt)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Performing Scan: hsa-miR-193b-5p
vs ENSG00000109320|ENST00000226574
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Score for this Scan:
No Hits Found above Threshold
Complete
##Perl Script ##############
chomp($input=<STDIN>);
open(IN,$input) or die "Can not open the file";
@cont=<IN>;
foreach $line(@cont)
{
if($line=~/>>/)
{ chomp($line);
print "$line\n";
}
}
Its giving me output like this:: 3.txt
>>hsa-miR-491-5p
>>hsa-miR-491-5p
>>hsa-miR-491-5p
>>hsa-miR-639
>>hsa-miR-639
But I want the output like this::
>>hsa-miR-491-5p
ENSG00000169429|ENST00000307407 125.00 -19.70 125.00 -19.70 9000 22 1252 1059
>>hsa-miR-491-5p
ENSG00000169429|ENST00000307407 125.00 -18.30 125.00 -18.30 9000 22 1252 1028
>>hsa-miR-491-5p
ENSG00000169429|ENST00000307407 125.00 -14.70 125.00 -14.70 9000 22 1252 1059
I have also tried another following script:
print"Enter input file:\n";
chomp($filename=<STDIN>);
unless(open(FH,$filename))
{print"Cannot open the file..\n"; exit; }
open(OUT,">targets.txt") or die "can't help it";
@cont=<FH>;
close FH;
$flag=0; $seq=""; @anno=();
foreach (@cont)
{ if($_=~/^Complete/)
{ last;}
elsif($_=~/^\s+>>/)
{$flag=1;}
elsif($flag==1)
{ $seq.=$_;}
else { push (@anno,$_);}
}
print OUT$seq,"\n";
But its searching first hit but then its going directly to the last word Complete and printing all the content in between them. Could anyone suggest me what should I follow to get the correct output what I want. Any help would be appreciated. Thanks..
Just a minor adjustment in the script's regex below (added
\*\*
) was needed to make the script produce your desired output. Hope this helps!