How to remove characters after specific symbol from all columns, and make all charachters sperated by space and comma except the first column
1
0
Entering edit mode
4.5 years ago
Hann ▴ 110

Hi all,

I am trying to modify the formate of a big file:

The file is tab-delimited Here how the file looks like:

AB11.1  CB:0078_0.53    CB:0044464_0.42   CB:0005623_0.466
AB10.1  
AB01.2  CB:0036_0.4   CB:0003824_0.4       CB:0005575_0.7    CB:0005622_0.2 CB:0005623_0.6
AB01.2  CB:0036_0.3   CB:0003824_0.43      CB:0005575_0.7    CB:0005622_0.1

Please note that the number of columns for each row is not identical. The number of columns can be more than 400 or it can be only 1, and some few rows are empty like for the ID: AB10.1

I want to modify the formate first by removing all characters that come after this symbol _ including the symbol itself. Then modify the separators:

1- Only after the first column it is separated by tab-delimited

2- Starting from the second till the last column they should be separated by a comma and then space

So output file should look like this:

AB11.1    CB:0078, CB:0044464, CB:0005623
AB10.1  
AB01.2    CB:0036, CB:0003824, CB:0005575, CB:0005622, CB:0005623
AB01.2    CB:0036, CB:0003824, CB:0005575, CB:0005622

How to do that in a bash script (I have super basic knowledge)? or maybe python (never used it)?

bash • 841 views
ADD COMMENT
0
Entering edit mode
4.5 years ago
Ram 44k

Use sed for requirement 1. You want to remove all _\S+ (or if your format only has numbers and . following underscore, remove all _[0-9_]+.

Use awk or perl for the second requirement. It will be a bit tricky (you may have to loop from 2 to NF), but it will be easier than using R or learning python.

ADD COMMENT
0
Entering edit mode

Yes, I managed to do it with awk and sed;

To remove the last 6 characters from a file in each column awk '{for(i=1;i<=NF;i++) sub(/......$/,X,$i)}1'

ADD REPLY
0
Entering edit mode

That assumed you'll need to remove exactly 6 characters from each field, which doesn't seem to be the case. Please be careful with such assumptions.

ADD REPLY
0
Entering edit mode

Due to this, first column will be removed as there are 6 characters only.

ADD REPLY

Login before adding your answer.

Traffic: 1982 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6