How to remove all characters after a specific pattern?
2
0
Entering edit mode
5.4 years ago
star ▴ 350

I have a big table like 'df', I would like to remove all value after first ':' for each row.

I have tried :

cat df.bed | cut -f1 -d":" | head and cat df.bed |sed 's/:.*//' | head but they removed all columns after first ':' .

df:

rs1006501   T   A   0/0:14,0:14:42:0,42,596    A    0/0:5,0:5:15:0,15,177
rs1006502   NA  NA  NA                         C,T  ./.
rs1015190   NA  NA  NA                         T    1/1:0,2:2:6:75,6,0
rs10164686  G   A   0/0:1,0:1:3:0,3,46          NA  NA

desired output:

rs1006501   T   A   0/0    A    0/0
rs1006502   NA  NA  NA     C,T  ./.
rs1015190   NA  NA  NA     T    1/1
rs10164686  G   A   0/0   NA    NA
linux RNA-Seq awk sed • 1.2k views
ADD COMMENT
0
Entering edit mode

I'd try to replace all instances of <TAB><SOMETHING_MINIMAL_WITH_NO_TABS>:<SOMETHING_GREEDY_WITH_NO_TABS> with <TAB><SOMETHING_MINIMAL_WITH_NO_TABS>. And would do it in perl rather than sed, because sed is pretty ugly when matching on tabs.

ADD REPLY
1
Entering edit mode
5.4 years ago

Try this:

sed 's/:[^\t]*//g' df.bed
ADD COMMENT
0
Entering edit mode
5.4 years ago
Jeffin Rockey ★ 1.3k
awk -F$'\t' -v OFS="\t" '{split($4,fourth,":");split($6,sixth,":");print $1,$2,$3,fourth[1],$5,sixth[1]}' df.bed
ADD COMMENT

Login before adding your answer.

Traffic: 3851 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6