Ignore replacing a comma within quotes
1
0
Entering edit mode
3.8 years ago
vinayjrao ▴ 250

Hi,

I have a file with the following format -

Name,Age,"City, Country",Profession

Mitch,27,"Melbourne,Australia",Student

John,31,"Boston,USA",Chef

I want to make tis a tab separated file, but while ignoring the commas contained within the quotes with the output file looking like -

Name      Age      "City,Country"      Profession

Mitch       27        "Melbourne,Australia"      Student

John        31       "Boston,USA",      Chef

Since I could not get a solution, I converted my file to tab delimited and then tried to revert it using sed with sed 's/"*'\t'*"/"*,*"/g' in order to identify the quotes that contained any text followed by tab, then text again ending with quotes, and replace the tab with a comma, but what this gives me is

Mitch       27        "*,*"Melbourne       Australia"*,*"      Student

Thanks in advance.

sed awk regex • 2.2k views
ADD COMMENT
0
Entering edit mode

is this an xy problem?

$ sed -r 's/,/\t/g;s/("\w+)\t/\1 /g' test.txt 

Name    Age "City  Country" Profession
Mitch   27  "Melbourne Australia"   Student
John    31  "Boston USA"    Chef
ADD REPLY
0
Entering edit mode

Hi,

Thanks for your response, but I feel my question was not clear enough.

When the raw file (.csv) is used, I want to replace all commas (except for the ones in quotes) by tab. In the last part, I mentioned that I converted the csv file entirely to a tab-delimited file, and then tried to convert "Melbourne      Australia" to "Melbourne,Australia" which is not working.

ADD REPLY
0
Entering edit mode

Please post exact input and expected output.

ADD REPLY
0
Entering edit mode

Hi,

The input file has over 700 lines, so I am just putting up the header followed by 2 lines -

Name,Age,"City, Country",Profession

Mitch,27,"Melbourne,Australia",Student

John,31,"Boston,USA",Chef

The expected output is -

Name       Age     "City,Country"      Profession

Mitch         27      "Melbourne,Australia"        Student

John         31       "Boston,USA"      Chef

ADD REPLY
1
Entering edit mode
3.8 years ago

Try this: you should have replaced space with comma in the original code I furnished.

$ sed -r 's/,/\t/g;s/("\w+)\t/\1,/g' test.txt 

Name    Age "City, Country" Profession
Mitch   27  "Melbourne,Australia"   Student
John    31  "Boston,USA"    Chef
ADD COMMENT
0
Entering edit mode

Thanks a lot for the command, but could you please expain it?

ADD REPLY
1
Entering edit mode

sed -r -use extended regular expression.

s/,/\t/g - Replace all commas with tab (first sed command)

; - starts next sed command

s/("\w+)\t/\1,/g - This has two parts: store a pattern, replace pattern followed by tab with pattern it self and comma. Here the pattern is ", one or more word characters. This way of recalling a pattern is called back-referencing.

1 is order of capture pattern. If there are two patterns, they would be called after order of appearance.

A little bit tricky one with same answer is:

$ sed -r 's/,("[^"]*")*/\t\1/g' test.txt 

Name    Age "City, Country" Profession
Mitch   27  "Melbourne,Australia"   Student
John    31  "Boston,USA"    Chef
ADD REPLY
0
Entering edit mode

It works exactly as required. Thanks a lot :)

Please convert your comment to an answer so that I can accept it.

ADD REPLY

Login before adding your answer.

Traffic: 1501 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6