Hello all,
I have a large dataset with 33 column headers. The first header is the unique ID corresponding to and individual and 2 - 33 are different times at which medication use was asked. Each row in column 2 - 32 either contains an ATC code, a blank or $5 for not filled in. There are overlaps and unique ATC codes between 2 - 32 columns. What I would like to do is manipulate this dataset so I can have one column containing the unique ID's, then a column for each ATC code found in the different columns and then a 1 or 0 for each individual to state whether they used the ATC drug or not. Would anyone be able to help me with this?
The original data file looks like this:
id atc1 atc2 atc3
541 C10AA03 D05BX51 $5
469 M01AB05 L02AE02 J01CF05
721 D05BX51 J01CF05 **blank**
The final dataset I am trying to achieve would like this:
# Key: <id>
# id C10AA03 D05BX51 J01CF05 L02AE02 M01AB05
# <int> <int> <int> <int> <int> <int>
#1: 469 0 0 1 1 1
#2: 541 1 1 0 0 0
#3: 721 0 1 1 0 0
The 1 would indicate the individual was reported using the drug while the 0 would indicate they have not.
Thanks in advance,
Dominic
Show us a reproducible example of your data: the header and first 3 lines for starters.
Sure thing.
The original data file looks like this:
The final dataset I am trying to achieve would like this:
The 1 would indicate the individual was reported using the drug while the 0 would indicate they have not.