Use Perl To Edit Each Line Of A File
1
0
Entering edit mode
10.8 years ago
liupfskygre ▴ 210

Hi, all

I have a file a.txt contains lines like below:

mez:Mtc_0001 glycosyltransferase

mez:Mtc_0002 feoA; Iron dependent transcriptional regulator; K03709 DtxR family transcriptional regulator, Mn-dep

mez:Mtc_0003 feoB; ferrous iron transporter FeoB; K04759 ferrous iron transport protein B

(there is multi-space between mtc_000x and things following)

I want to use Perl to do following things, but I just begin to learn Perl,

1) delete all mez in all lines at the begining;

2) foreach $line (@line) {separate "\t" but not multi-space or ";"}

3) print and store the results in a new b.txt file and keep the a.txt file unchanged.

could you give some suggestions on this.

thanks!

perl • 5.0k views
ADD COMMENT
0
Entering edit mode

why perl ? one sed would ok.

ADD REPLY
1
Entering edit mode

thanks, I am also trying to learn perl but could not figure things out now. maybe it would be figured out after I go through regular expression chapter.

ADD REPLY
3
Entering edit mode
10.8 years ago
perl -ne '$_=~s/^mez://;$_=~s/;\s+/\t/g;print "$_\n";' a.txt > b.txt

use perl:

perl -ne 'your perl code'

delete all "mez:" at the beginning:

$_=~s/^mez://;

change all ";" followed by spaces to "\t":

$_=~s/;\s+/\t/g;

write result to new file without changing the input file:

> b.txt
ADD COMMENT
1
Entering edit mode

Well done, and nice explanation of the parts (+1)! You can, however, do the following:

perl -p -ne 's/^mez://;s/;\s+/\t/g' a.txt > b.txt

As you likely know, your s/// implicitly operates on $_, so it's not necessary to explicitly use $_; -p prints the line.

Nit: "... followed by spaces..." -> s/(?=spaces)/white/

ADD REPLY
0
Entering edit mode

Thanks! It worked well!

ADD REPLY
0
Entering edit mode

You're most welcome!

ADD REPLY
0
Entering edit mode

thanks

there is multi-space between mtc_000x and things following, how to change those spaces into tab "\t" too, like $_=~s/#multi-spaces#/\t/, right? I review the book, and now I know the use of s///, but what do "=~" symbol and "^"mean?

ADD REPLY
0
Entering edit mode

David Langenberger's s/;\s+/\t/g substitutes a tab for a semi-colon followed by one or more whitespaces. However, you also want the same tab substitution for a the whitespaces after the mtc_000x pattern. You can use the above--with just a couple of changes--right after the first substitution:

s/\s+/\t/

This will replace the first set of whitespaces with a tab. Given the above, final the oneliner could be:

perl -p -ne 's/^mez://;s/\s+/\t/;s/;\s+/\t/g' a.txt > b.txt

The =~ symbol is Perl's regex binding operator. The ^ notation above is an anchor which means "from the beginning of the line."

ADD REPLY

Login before adding your answer.

Traffic: 1971 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6