Question

Could someone explain this perl command?

0

Entering edit mode

6.9 years ago

Seq225 ▴ 110

Hi,

I have a bash script and it looks like this:

*#!/bin/bash
for i in *dat.gz
do gunzip $i
echo uniprot_sprot_archaea.dat | perl -slane '$a=(split /\_/, $_)[2]; $a=~/(\w+).dat/; $b=$1; print "perl screen_complete_proteome_from_uniprot_division.pl \$i >> uniprot_".$b.".fasta"' -- -i=$i
done*

I don't know coding. But I need to understand this perl commands. From echo to end of the command, I don't understand. Could someone please explain them?

Thanks a ton, and sorry for these silly request.

bash perl • 2.2k views

ADD COMMENT • link 6.9 years ago by Seq225 ▴ 110

0

Entering edit mode

I have some doubts that this is working as intended. What is it you're trying to do ?
For instance, while the bash script will unzip all dat.gz files, the perl line will repeatedly work on the string uniprot_sprot_archaea.dat. The split part extract the string archae and so the perl one-liner will print the following every time the bash script unzips a file:
perl screen_complete_proteome_from_uniprot_division.pl $i >> uniprot_archaea.fasta
Note the presence of $i in the output, this is because the \ preceding $i, tells perl to not interpret the $ sign as indicating a variable.

ADD REPLY • link 6.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

I am very confused here. I think the main execution here is based on the perl script. I have provided it below.

The entire idea is to extract sequences with “Complete Proteome” in the Keyword from files downloaded (Swiss-Prot and TrEMBL). All I am trying to do is repeating some analyses from this paper. http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2002266#sec014 (method section HGT analyses)

ADD REPLY • link 6.9 years ago by Seq225 ▴ 110

0

Entering edit mode

The script screen_complete_proteome_from_uniprot_division.pl is never executed when you run the bash script you posted. If you want to execute it from within the perl one-liner, one option is to use the qx operator, i.e. replace print by qx, but that's not the only problem you have.

ADD REPLY • link 6.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thanks very much. I will replace the print with qx. Also, if you don't mind and have time to spend, is it possible to point out the other problems?

Thanks like a ocean!!

ADD REPLY • link 6.9 years ago by Seq225 ▴ 110

0

Entering edit mode

For every file that is unzipped, the bash script passes the string 'uniprot_sprot_archaea.dat' to perl, i.e. it's always printing the line: perl screen_complete_proteome_from_uniprot_division.pl $i >> uniprot_archaea.fasta

Maybe you want run the script screen_complete_proteome_from_uniprot_division.pl on each unzipped file ? Then try something along these lines:

 #!/bin/bash
 for i in *dat.gz
 do gunzip $i
 echo $i | perl -slane '$_=~s/\.gz//; # remove the .gz extension from the filename
                        $a=(split /\_/, $_)[2]; # split on _ and extract the third part
                        $a=~/(\w+)\.dat/; $b=$1; # extract all characters before .dat
                         qx(perl screen_complete_proteome_from_uniprot_division.pl $_ > uniprot_$b.fasta)' # execute perl script on unzipped input file and save output in .fasta file
done

ADD REPLY • link 6.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Great. Thank you! I will try these...

ADD REPLY • link 6.9 years ago by Seq225 ▴ 110

0

Entering edit mode

I'd recommend redirecting stdout of the bash script to a file and executing that file. Running the perl script from a loop will make debugging more difficult.

ADD REPLY • link 6.9 years ago by Ram 45k

score 4 · Answer 1 · 2018-08-20

4

Entering edit mode

6.9 years ago

Ram 45k

split/|_/, $_[2]

Split the current line by underscore, pick 3rd value

$a=~/(\w+).dat/; $b=$1

Pick all the alphanumeric characters preceding .dat, assign that to $b

print "perl screen_complete_proteome_from_uniprot_division.pl \$i >> uniprot_".$b.".fasta"' -- -i=$i

Do stuff with the variables obtained above.

perl -slane

Run perldoc perlrun from the command line and read the manual. That is the explanation for each of the -s, -l, -a, -n and -e options

-s enables the -- -i=$i variable passing part.
-e is "execute this perl stuff that I'm passing as a string", like bash -c or Rscript -e. Instead of processing a file, this makes the command process a command line argument.

perl -nle is essentially like awk, running the command-passed-as-an-argument per line of input file.

I'm not sure what the significance of the -a is here.

ADD COMMENT • link 6.9 years ago by Ram 45k

0

Entering edit mode

Great!! Thank you very much Ram. I am running the script, however, it is not giving me what I want. Not sure if something is wrong with the screen_complete_proteome_from_uniprot_division.pl script

Here is what it looks like:

*

#!/usr/bin/env perl
use strict;
use warnings;
use G;
# perl screen_complete_proteome_from_uniprot_division.pl EBML_format.dat
## EMBL_format.dat ex : uniprot_sprot_archaea.dat
my $input = shift;
my %out = &get_fasta($input);
sub get_fasta{
  my $input = $_[0];
  my $tree = readFile($input, -format=>"swiss" );
  my ($dat, $div) = (split /\_/, $input)[1,2];
  $div =~ s/.dat//;
  foreach my $entry ( sort keys %{$tree} ) {
    if( defined $tree->{$entry}->{KW} && $tree->{$entry}->{KW} =~ /Complete\sproteome/ ) {
      next if $tree->{$entry}->{OC} =~ /Tardigrada/ ;
#      next if $tree->{$entry}->{OC} =~ /Nematoda/ ;
#      next if $tree->{$entry}->{OC} =~ /Arthropoda/ ;
      my %fasta;
      my $seq = $tree->{$entry}->{"  "};
      $seq =~ s/\s+//g;
      say $tree->{$entry}->{LOCUS}->{id}."|".$dat."|".$div if $tree->{$entry}->{OC} =~ /Metazoa/;
      $fasta{$tree->{$entry}->{LOCUS}->{id}."|".$dat."|".$div} = $seq;
      say to_fasta(%fasta);
    }
  }
}

*

Would you be able to help me figuring it out?

I appreciate your input very very much!

ADD REPLY • link 6.9 years ago by Seq225 ▴ 110

0

Entering edit mode

See my comment above.

ADD REPLY • link 6.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

I'm sorry, I'm not in a place to debug perl code - I've been out of touch with Perl for a while now, and Perl is a difficult-to-debug language to begin with.