sed or awk command
3
0
Entering edit mode
5.5 years ago
harry ▴ 40
ENST00000448914.1   13  4.28456     0       0
ENST00000415118.1   8   3.52171     0       0

how to remove the (.*) from column 1 and it looks like

ENST00000448914 13  4.28456     0       0
ENST00000415118     8   3.52171     0       0

please tell me the sed command or awk command to remove it only .

RNA-Seq • 1.3k views
ADD COMMENT
2
Entering edit mode
5.5 years ago
AK ★ 2.2k

Hi harry,

By awk:

awk 'BEGIN{OFS="\t"} {gsub("\\.[0-9]+$", "", $1); print}'

(updated) For sed you can try:

sed -r 's/\.[0-9]+\t/\t/'
ADD COMMENT
0
Entering edit mode
5.5 years ago

Hi harry

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

You could try sed like this

sed 's/\.1//'
ADD COMMENT
0
Entering edit mode

This would only address .1s. We should account for .\d+, right?

ADD REPLY
0
Entering edit mode
5.5 years ago
vin.darb ▴ 300

If the gene is always on the first column:

sed 's/\.[0-9]\{1,\}//' yourfile.txt

should work

ADD COMMENT
0
Entering edit mode

it will remove other (.) from other column.

ADD REPLY
0
Entering edit mode

It's weird because I try it and it don't remove the others (.) because I didn't put the 'g' global flag after the last slash

ADD REPLY
0
Entering edit mode

It might if the first . it encounters is not the transcript version. The awk solution, or yours modified to include an anchor and a first-word ensuring regex would be safe.

ADD REPLY

Login before adding your answer.

Traffic: 1150 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6