Question

Ignore replacing a comma within quotes

0

Entering edit mode

3.8 years ago

vinayjrao ▴ 250

Hi,

I have a file with the following format -

Name,Age,"City, Country",Profession

Mitch,27,"Melbourne,Australia",Student

John,31,"Boston,USA",Chef

I want to make tis a tab separated file, but while ignoring the commas contained within the quotes with the output file looking like -

Name      Age      "City,Country"      Profession

Mitch       27        "Melbourne,Australia"      Student

John        31       "Boston,USA",      Chef

Since I could not get a solution, I converted my file to tab delimited and then tried to revert it using sed with sed 's/"*'\t'*"/"*,*"/g' in order to identify the quotes that contained any text followed by tab, then text again ending with quotes, and replace the tab with a comma, but what this gives me is

Mitch 27 "*,*"Melbourne Australia"*,*" Student

Thanks in advance.

sed awk regex • 2.2k views

ADD COMMENT • link 3.8 years ago by vinayjrao ▴ 250

0

Entering edit mode

is this an xy problem?

$ sed -r 's/,/\t/g;s/("\w+)\t/\1 /g' test.txt 

Name    Age "City  Country" Profession
Mitch   27  "Melbourne Australia"   Student
John    31  "Boston USA"    Chef

ADD REPLY • link 3.8 years ago by cpad0112 21k

0

Entering edit mode

Hi,

Thanks for your response, but I feel my question was not clear enough.

When the raw file (.csv) is used, I want to replace all commas (except for the ones in quotes) by tab. In the last part, I mentioned that I converted the csv file entirely to a tab-delimited file, and then tried to convert "Melbourne Australia" to "Melbourne,Australia" which is not working.

ADD REPLY • link 3.8 years ago by vinayjrao ▴ 250

0

Entering edit mode

Please post exact input and expected output.

ADD REPLY • link 3.8 years ago by cpad0112 21k

0

Entering edit mode

Hi,

The input file has over 700 lines, so I am just putting up the header followed by 2 lines -

Name,Age,"City, Country",Profession

Mitch,27,"Melbourne,Australia",Student

John,31,"Boston,USA",Chef

The expected output is -

Name       Age     "City,Country"      Profession

Mitch         27      "Melbourne,Australia"        Student

John         31       "Boston,USA"      Chef

ADD REPLY • link 3.8 years ago by vinayjrao ▴ 250

score 1 · Answer 1 · 2021-02-27

1

Entering edit mode

3.8 years ago

cpad0112 21k

Try this: you should have replaced space with comma in the original code I furnished.

$ sed -r 's/,/\t/g;s/("\w+)\t/\1,/g' test.txt 

Name    Age "City, Country" Profession
Mitch   27  "Melbourne,Australia"   Student
John    31  "Boston,USA"    Chef

ADD COMMENT • link 3.8 years ago by cpad0112 21k

0

Entering edit mode

Thanks a lot for the command, but could you please expain it?

ADD REPLY • link 3.8 years ago by vinayjrao ▴ 250

1

Entering edit mode

sed -r -use extended regular expression.

s/,/\t/g - Replace all commas with tab (first sed command)

; - starts next sed command

s/("\w+)\t/\1,/g - This has two parts: store a pattern, replace pattern followed by tab with pattern it self and comma. Here the pattern is ", one or more word characters. This way of recalling a pattern is called back-referencing.

1 is order of capture pattern. If there are two patterns, they would be called after order of appearance.

A little bit tricky one with same answer is:

$ sed -r 's/,("[^"]*")*/\t\1/g' test.txt 

Name    Age "City, Country" Profession
Mitch   27  "Melbourne,Australia"   Student
John    31  "Boston,USA"    Chef