Perl Command : Add A Column With Strand Information
2
0
Entering edit mode
11.0 years ago
biolab ★ 1.4k

Dear all,

i have a blast output file and need to add a column for strand infromation (+ or -). For example,

gene1 contig2 1 69 100 169
gene2 contig20 3 53 250 200

i need to change it to

gene1 contig2 1 69 100 169 +
gene2 contig20 3 53 250 200 -

note: 100<169 +, 250>200 -

i am new in perl programming. my command is $ cat a.txt | perl -e 'while (<>){chomp; @array = split(//, $_); if ($array[4]< $array[5]){print"@array\t+\n"} else {print"@array\t-\n"} }'

The output is

g e n e -   c o n t i g 2   1   6 9   1 0 0   1 6 9
g e n e -   c o n t i g 2 0   3   5 3   2 5 0   2 0 0

Could anyone help to correct the errors and briefly describe it? Thank you very much!!

perl • 4.5k views
ADD COMMENT
1
Entering edit mode

try: replace @array = split(//, $_); with @array = split(/\s/, $_); or simply @array = split;

ADD REPLY
0
Entering edit mode

thank you very much for correction.

ADD REPLY
4
Entering edit mode
11.0 years ago
SES 8.6k

Here's a simpler Perl solution (similar to the Awk solution of Frédéric Mahé):

$ echo -e 'gene1 contig2 1 69 100 169\ngene2 contig20 3 53 250 200' \
| perl -ane 'print join "\t", @F, $F[4] > $F[5] ? "-\n" : "+\n"'
gene1    contig2     1    69    100    169    +
gene2    contig20    3    53    250    200    -

You could make it perhaps more readable by adding explicit loops and variables, but for one-liners I think it's best to use the tools you have and save yourself some typing.

EDIT: Perl's command line switches are documented in perlrun (typeperldoc perlrun from the command line).

  • The -e tells Perl to process the command line arguments, which would be any files or STDIN (as is the case above).
  • The -n switch will make Perl loop over the input line by line (the -p does the same, but turns on an implicit print).
  • The -a tells Perl to autosplit the input and put it into an array called "@F" when used with -n or -p. You can change the delimiter with the -F switch.
ADD COMMENT
0
Entering edit mode

Thank you! The shorter perl command is really cool.

ADD REPLY
0
Entering edit mode

Hi, SES, can i ask you one more question? What's the -ane option stands for? Would you please breifly introduce these functions to me, as I googled perl -ane, but did not find an answer. Thank you very much!

ADD REPLY
0
Entering edit mode

I updated my post and to add an explanation of the command.

ADD REPLY
0
Entering edit mode

Your explanations are really informative and help me learn perl. Thanks!

ADD REPLY
3
Entering edit mode
11.0 years ago
PoGibas 5.1k

Simple awk solution awk '{if ($5>$6) print $0,"-"; else print $0,"+"}' INPUT

echo -e 'gene1 contig2 1 69 100 169\ngene2 contig20 3 53 250 200' |  awk '{if ($5>$6) print $0,"-"; else print $0,"+"}'
>gene1 contig2 1 69 100 169 +  
gene2 contig20 3 53 250 200 -
ADD COMMENT
3
Entering edit mode

Hi Pgibas, in Awk if-then-else conditional can be eventually replaced with a ternary operator (shorter but maybe less clear):

echo -e 'gene1 contig2 1 69 100 169\ngene2 contig20 3 53 250 200' | awk '{print $0,($5 > $6) ? "-" : "+"}'

ADD REPLY
0
Entering edit mode

Thank you, this is really cool

ADD REPLY
0
Entering edit mode

Thank you very much for your solutions!

ADD REPLY

Login before adding your answer.

Traffic: 2424 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6