Question

Use egrep on only one column

1

Entering edit mode

8.2 years ago

ISB ▴ 30

Hi,

I have a table like this;

6       6:29002062:rs7755402    0       29002062        G       A
6       6:29004091:rs9468471    0       29004091        A       G
6       6:29006250:rs9468473    0       29006250        A       G
6       6:29006493:rs9461499    0       29006493        C       A
6       6:29006844:rs7743837    0       29006844        G       A

I want to remove everything before "rs" in the second column. I know I can use egrep,

 egrep -o "(rs\S+)" file | cut -d " " -f 2 > newfile

However, then I`m left with only the rs string;

But I actually want the other columns too. Any help is greatly appreciated!

unix grep • 4.4k views

ADD COMMENT • link updated 8.2 years ago by Rohit ★ 1.5k • written 8.2 years ago by ISB ▴ 30

1

Entering edit mode

8.2 years ago

harold.smith.tarheel ★ 5.0k

Use awk with a colon as delimiter and print the third field:

awk -F ':' '{print $3}' file > newfile

ADD COMMENT • link 8.2 years ago by harold.smith.tarheel ★ 5.0k

0

Entering edit mode

Terrific!

But, I also want to keep the first column ("6"), do you know how to adapt it for doing so?

Thank you soo much!

ADD REPLY • link 8.2 years ago by ISB ▴ 30

1

Entering edit mode

Try using multiple delimiters in the -F for awk. e.g. [\t|:]

Edit:

Well since people seem intent on providing ready to run solutions here we go awk 'BEGIN{FS="\t|:"; OFS="\t"}{print $1,$4,$5,$6,$7,$8}' your_file > new_file

If your file has spaces (instead of uniform tabs) then use the sed solution below to convert.

ADD REPLY • link 8.2 years ago by GenoMax 151k

0

Entering edit mode

8.2 years ago

st.ph.n ★ 2.7k

An issue here is the number of spaces, and not tabs between your columns, and I'm assuming that cols 1,4 represent the information in col 2 separting by :. To make this easier, you can convert any number of spaces to tabs with sed:

sed 's/ \+ /\t/g' infile > tab_sep_file

then cut, replacing ':' with tab:

cut -f 2 tab_sep_file | cut -f 1,3 -d ':' | tr ':' '\t' > newfile

ADD COMMENT • link 8.2 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

Thank you, but these commands leave me with only the two first columns

ADD REPLY • link 8.2 years ago by ISB ▴ 30

0

Entering edit mode

Ok, your comment on harold's answer led me to believe you only needed col 1, and the separated field in col 2.

ADD REPLY • link 8.2 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

Sorry for the misunderstanding!

ADD REPLY • link 8.2 years ago by ISB ▴ 30

0

Entering edit mode

8.2 years ago

Jake Warner ▴ 840

sed 's/[0-9]:.*://' infile > outfile

The pattern recognizes any number, a colon, anything of any length, another colon and deletes.

Example below (the echo call just puts it through):

$echo "6       6:29002062:rs7755402    0       29002062        G       A "| sed 's/[0-9]:.*://'
6 rs7755402 0 29002062 G A

ADD COMMENT • link 8.2 years ago by Jake Warner ▴ 840

0

Entering edit mode

8.2 years ago

Rohit ★ 1.5k

Why not use awk replace if you want specific columns replaced? Also stackoverflow is full of such examples

awk -F'\t' 'BEGIN{OFS="\t"} {gsub(".*:rs","rs",$2); print }' infile >outfile

ADD COMMENT • link 8.2 years ago by Rohit ★ 1.5k

0

Entering edit mode

original input from OP is space delimited, would need to convert to tab.

ADD REPLY • link 8.2 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

I guess it is space delimited due to formatting issue. OP has a table, for which tab-delimitation looks more natural.

ADD REPLY • link 8.2 years ago by Santosh Anand 5.8k

1

Entering edit mode

Usually if I have idea of the exact delimiter (space or tab) and its a combination, I use awk without the input delimiter -

awk 'BEGIN{OFS="\t"} {gsub(".*:rs","rs",$2); print }' infile >outfile

ADD REPLY • link 8.2 years ago by Rohit ★ 1.5k

0

Entering edit mode

That's why I included the sed line above.

ADD REPLY • link 8.2 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

I agree that the sed works, but it does not work on the selected column-only replace principle. Nevertheless sed would be my favorite too - was just proposing an alternative here

ADD REPLY • link 8.2 years ago by Rohit ★ 1.5k

score 3 · Accepted Answer · 2017-03-29

3

Entering edit mode

8.2 years ago

shenwei356 8.7k

Shell is great, but you will like csvtk, a cross-platform, efficient, practical and pretty CSV/TSV toolkit.

Using csvtk replace to edit specific column(s), download, usage.

$ csvtk replace -H -t -f 2 -p '.+:' -r '' file
6       rs7755402       0       29002062        G       A
6       rs9468471       0       29004091        A       G
6       rs9468473       0       29006250        A       G
6       rs9461499       0       29006493        C       A
6       rs7743837       0       29006844        G       A

long-option version:

$ csvtk replace --no-header-row --tabs --fields 2 --pattern '.+:' --replacement ''  file

ADD COMMENT • link 8.2 years ago by shenwei356 8.7k

0

Entering edit mode

Awesome! Thank you very much! This was exactly what I needed

ADD REPLY • link 8.2 years ago by ISB ▴ 30

0

Entering edit mode

pleased to hear you like it, you can visit the site and explore the rich functions it provides.

ADD REPLY • link 8.2 years ago by shenwei356 8.7k

score 1 · Accepted Answer · 2017-03-29

1

Entering edit mode

8.2 years ago

Santosh Anand 5.8k

sed 's/\t\b.*:rs/\trs/' file > newfile

Explantion:

s/ = substitution
\t\b = look for tab, and then a word boundary
.*:rs/\trs = ./ matches one or more charaters. you match everything till rs and replace it by tab and rs (ie \trs)

ADD COMMENT • link 8.2 years ago by Santosh Anand 5.8k

0

Entering edit mode

@Santosh: This solution does not seem to be working as advertised nor does it produce output required by OP. Can you recheck?

ADD REPLY • link 8.2 years ago by GenoMax 151k

0

Entering edit mode

oops, my bad! used space before the word boundary, which should be tab. updated.

ADD REPLY • link 8.2 years ago by Santosh Anand 5.8k

1

Entering edit mode

Perfect.

ISB : You are able to accept more than one answers. Since you got multiple options here you should test and "accept" other solutions that work.

ADD REPLY • link 8.2 years ago by GenoMax 151k