Help Needed: Perl Script To Read Sequence, Add Sequence Header In The Output File
3
1
Entering edit mode
11.5 years ago

Hi, sorry to bother.

I think it is a very simple question for the professional programmers:

I have the following sequences in a .txt file:

AGCCCTCTGTAGCATTTGTATGGC
AGCCCTCTGTAGTATTTCTATGGC
AGCCCTCTGTAGTATTTGTATGGCTCCTTAGAC
...

I want them to be like:

>Leaf_1
AGCCCTCTGTAGCATTTGTATGGC
>Leaf_2
AGCCCTCTGTAGTATTTCTATGGC
>Leaf_3
AGCCCTCTGTAGTATTTGTATGGCTCCTTAGAC
...

My script does not work. Don't know why. Please help correct it. Thanks!

die "perl $0 < Input txt >\n" unless(@ARGV == 1);
open IN,$ARGV[0];
open OUT,">Output.fa";
my $count = 1;

while(<IN>){
  if($_=~/[ATCG]/gi) 
  {
     print ">Leaf_$count\n";
     print OUT ">Leaf_$count\n";
    $count++;
   }
}
close IN; close OUT;
perl sequence • 4.6k views
ADD COMMENT
0
Entering edit mode

This is extremely similar to all your other questions, especially Perl script to extract. Also please use sensible tags for your questions (i.e. not "perl to add sequence header").

ADD REPLY
2
Entering edit mode
11.5 years ago
Kenosis ★ 1.3k

Here's another option:

use strict;
use warnings;

while(<>){
    print ">Leaf_$.\n$_";
}

Usage: perl script.pl inFile [>outFile]

The last, optional parameter directs output to a file.

As a oneliner: perl -ne 'print ">Leaf_$.\n$_"' inFile [>outFile]

Output on your dataset:

>Leaf_1
AGCCCTCTGTAGCATTTGTATGGC
>Leaf_2
AGCCCTCTGTAGTATTTCTATGGC
>Leaf_3
AGCCCTCTGTAGTATTTGTATGGCTCCTTAGAC

Both take advantage of Perl's $. variable which contains the line number of the file being read.

Hope this helps!

ADD COMMENT
0
Entering edit mode

Missing the '>'

ADD REPLY
0
Entering edit mode

Corrected. Thank you.

ADD REPLY
0
Entering edit mode

In this case, u r assuming that each line in the text file has a new line character. If it does not, then this program does not print the file as it is supposed to. You need to chomp the file first and print new line character later in the print statement.

ADD REPLY
0
Entering edit mode

No need to chomp and then add "\n", as each line (except perhaps the last) ends with a record separator--else there wouldn't be lines--and the default input record separator's ($/) value is "\n". See $/ in perlvar.

ADD REPLY
0
Entering edit mode
11.5 years ago
Jordan ★ 1.3k

Proper programming practice is to use strict and warnings.

And you need to chomp the file to remove new line characters. Try this code for e.g.,

use strict;
use warnings;

if (@ARGV != 1) {
    die "Usage: print_fasta.pl <raw_input_file>\n";
}

open(IN, $ARGV[0]) or die "Error!! Cannot open $ARGV[0]: $!\n";
open(OUT, ">Output.fa") or die "Error!! Cannot create the file: $!\n";

my @file = <IN>;
chomp(@file);
my $count = 1;

foreach my $line (@file) {
    print OUT ">Leaf_$count\n";
    print OUT "$line\n";
    $count++;
}

NOTE: Here I'm assuming that all the lines in the text file only have the sequence in each line, like in your example.

ADD COMMENT
0
Entering edit mode
11.5 years ago
csiu ▴ 60

Warning: I am also not a professional, but the following works :)

$ perl below-script.pl input-sequences.txt

#!/usr/bin/perl                                                             

open (INPUT, $ARGV[0]) or die $!;                                           
open (OUTPUT, ">Output.fa");                                                

my $count = 1;                                                              
while (<INPUT>){                                                            
    chomp;                                                                  
    print OUTPUT ">Leaf_$count\n$_\n";                                      
    $count += 1;                                                            
}                                                                           

close (OUTPUT);                                                             
close (INPUT);
ADD COMMENT
0
Entering edit mode

Problem solved. Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6