Use egrep on only one column
6
1
Entering edit mode
7.7 years ago
ISB ▴ 30

Hi,

I have a table like this;

6       6:29002062:rs7755402    0       29002062        G       A
6       6:29004091:rs9468471    0       29004091        A       G
6       6:29006250:rs9468473    0       29006250        A       G
6       6:29006493:rs9461499    0       29006493        C       A
6       6:29006844:rs7743837    0       29006844        G       A

I want to remove everything before "rs" in the second column. I know I can use egrep,

 egrep -o "(rs\S+)" file | cut -d " " -f 2 > newfile

However, then I`m left with only the rs string;

rs7755402
rs9468471
rs9468473
rs9461499
rs7743837
rs6919044
rs41424052
rs6924824
rs6456886
rs6456887

But I actually want the other columns too. Any help is greatly appreciated!

unix grep • 3.8k views
ADD COMMENT
3
Entering edit mode
7.7 years ago

Shell is great, but you will like csvtk, a cross-platform, efficient, practical and pretty CSV/TSV toolkit.

Using csvtk replace to edit specific column(s), download, usage.

$ csvtk replace -H -t -f 2 -p '.+:' -r '' file
6       rs7755402       0       29002062        G       A
6       rs9468471       0       29004091        A       G
6       rs9468473       0       29006250        A       G
6       rs9461499       0       29006493        C       A
6       rs7743837       0       29006844        G       A

long-option version:

$ csvtk replace --no-header-row --tabs --fields 2 --pattern '.+:' --replacement ''  file
ADD COMMENT
0
Entering edit mode

Awesome! Thank you very much! This was exactly what I needed

ADD REPLY
0
Entering edit mode

pleased to hear you like it, you can visit the site and explore the rich functions it provides.

ADD REPLY
1
Entering edit mode
7.7 years ago
sed 's/\t\b.*:rs/\trs/' file > newfile

Explantion:

s/ = substitution
\t\b = look for tab, and then a word boundary
.*:rs/\trs = ./ matches one or more charaters. you match everything till rs and replace it by tab and rs (ie \trs)
ADD COMMENT
0
Entering edit mode

@Santosh: This solution does not seem to be working as advertised nor does it produce output required by OP. Can you recheck?

ADD REPLY
0
Entering edit mode

oops, my bad! used space before the word boundary, which should be tab. updated.

ADD REPLY
1
Entering edit mode

Perfect.

ISB : You are able to accept more than one answers. Since you got multiple options here you should test and "accept" other solutions that work.

ADD REPLY
1
Entering edit mode
7.7 years ago

Use awk with a colon as delimiter and print the third field:

awk -F ':' '{print $3}' file > newfile
ADD COMMENT
0
Entering edit mode

Terrific!

But, I also want to keep the first column ("6"), do you know how to adapt it for doing so?

Thank you soo much!

ADD REPLY
1
Entering edit mode

Try using multiple delimiters in the -F for awk. e.g. [\t|:]

Edit:

Well since people seem intent on providing ready to run solutions here we go awk 'BEGIN{FS="\t|:"; OFS="\t"}{print $1,$4,$5,$6,$7,$8}' your_file > new_file

If your file has spaces (instead of uniform tabs) then use the sed solution below to convert.

ADD REPLY
0
Entering edit mode
7.7 years ago
st.ph.n ★ 2.7k

An issue here is the number of spaces, and not tabs between your columns, and I'm assuming that cols 1,4 represent the information in col 2 separting by :. To make this easier, you can convert any number of spaces to tabs with sed:

sed 's/ \+ /\t/g' infile > tab_sep_file

then cut, replacing ':' with tab:

cut -f 2 tab_sep_file | cut -f 1,3 -d ':' | tr ':' '\t' > newfile
ADD COMMENT
0
Entering edit mode

Thank you, but these commands leave me with only the two first columns

ADD REPLY
0
Entering edit mode

Ok, your comment on harold's answer led me to believe you only needed col 1, and the separated field in col 2.

ADD REPLY
0
Entering edit mode

Sorry for the misunderstanding!

ADD REPLY
0
Entering edit mode
7.7 years ago
Jake Warner ▴ 840
sed 's/[0-9]:.*://' infile > outfile

The pattern recognizes any number, a colon, anything of any length, another colon and deletes.

Example below (the echo call just puts it through):

$echo "6       6:29002062:rs7755402    0       29002062        G       A "| sed 's/[0-9]:.*://'
6 rs7755402 0 29002062 G A
ADD COMMENT
0
Entering edit mode
7.7 years ago
Rohit ★ 1.5k

Why not use awk replace if you want specific columns replaced? Also stackoverflow is full of such examples

awk -F'\t' 'BEGIN{OFS="\t"} {gsub(".*:rs","rs",$2); print }' infile >outfile
ADD COMMENT
0
Entering edit mode

original input from OP is space delimited, would need to convert to tab.

ADD REPLY
0
Entering edit mode

I guess it is space delimited due to formatting issue. OP has a table, for which tab-delimitation looks more natural.

ADD REPLY
1
Entering edit mode

Usually if I have idea of the exact delimiter (space or tab) and its a combination, I use awk without the input delimiter -

awk 'BEGIN{OFS="\t"} {gsub(".*:rs","rs",$2); print }' infile >outfile
ADD REPLY
0
Entering edit mode

That's why I included the sed line above.

ADD REPLY
0
Entering edit mode

I agree that the sed works, but it does not work on the selected column-only replace principle. Nevertheless sed would be my favorite too - was just proposing an alternative here

ADD REPLY

Login before adding your answer.

Traffic: 2575 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6