Question

Removing text after last underscore on given column

1

Entering edit mode

4.7 years ago

genomes_and_MGEs ▴ 10

Hey guys,

I have a tab-delimited file like this

NZ_CP007546.1_En_asburiae4561905        434     17636
NZ_CP007546.1_En_asburiae4561905        85823   93173
NZ_CP007546.1_En_asburiae4561905        178912  203202
NZ_CP007546.1_En_asburiae4561905        313008  317041

...

I want to remove text after the last underscore on the 1st column, so that I have

NZ_CP007546.1_En        434     17636
NZ_CP007546.1_En        85823   93173
NZ_CP007546.1_En        178912  203202
NZ_CP007546.1_En        313008  317041

I know that I can use sed to do that, but when I use sed -i 's/_[^_]*$//' it removes all the text in the same line, and my goal is to do that only for the 1st column. Thanks!

sequence • 1.6k views

ADD COMMENT • link updated 4.7 years ago by JC 13k • written 4.7 years ago by genomes_and_MGEs ▴ 10

score 3 · Answer 1 · 2020-09-10

3

Entering edit mode

4.7 years ago

Pierre Lindenbaum 166k

cat in.txt | rev | sed 's/\t[^_\t]*_/\t/'  | rev

EDIT:

sorry: much simplier is

sed 's/_[^_\t]*\t/_\t/' < input.txt

ADD COMMENT • link 4.7 years ago by Pierre Lindenbaum 166k

score 1 · Answer 2 · 2020-09-10

1

Entering edit mode

4.7 years ago

JC 13k

perl -pe 's/(_\w+)_\w+/$1/' < in.txt
NZ_CP007546.1_En        434     17636 
NZ_CP007546.1_En        85823   93173
NZ_CP007546.1_En        178912  203202
NZ_CP007546.1_En        313008  317041

ADD COMMENT • link 4.7 years ago by JC 13k