Combining Overlapping ATC Code Usage in R
1
0
Entering edit mode
3 months ago

Hello all,

I have a large dataset with 33 column headers. The first header is the unique ID corresponding to and individual and 2 - 33 are different times at which medication use was asked. Each row in column 2 - 32 either contains an ATC code, a blank or $5 for not filled in. There are overlaps and unique ATC codes between 2 - 32 columns. What I would like to do is manipulate this dataset so I can have one column containing the unique ID's, then a column for each ATC code found in the different columns and then a 1 or 0 for each individual to state whether they used the ATC drug or not. Would anyone be able to help me with this?

The original data file looks like this:

id atc1 atc2 atc3

541 C10AA03 D05BX51 $5

469 M01AB05 L02AE02 J01CF05

721 D05BX51 J01CF05 **blank**

The final dataset I am trying to achieve would like this:

 # Key: <id>
#      id C10AA03 D05BX51 J01CF05 L02AE02 M01AB05
#   <int>   <int>   <int>   <int>   <int>   <int>
#1:   469       0       0       1       1       1
#2:   541       1       1       0       0       0
#3:   721       0       1       1       0       0

The 1 would indicate the individual was reported using the drug while the 0 would indicate they have not.

Thanks in advance,
Dominic

R pharmacogenetics • 460 views
ADD COMMENT
1
Entering edit mode

Show us a reproducible example of your data: the header and first 3 lines for starters.

ADD REPLY
0
Entering edit mode

Sure thing.

The original data file looks like this:


    id atc1 atc2 atc3
    541 C10AA03 D05BX51 $5
    469 M01AB05 L02AE02 J01CF05
    721 D05BX51 J01CF05 NA

The final dataset I am trying to achieve would like this:

 # Key: <id>
    #      id C10AA03 D05BX51 J01CF05 L02AE02 M01AB05
    #   <int>   <int>   <int>   <int>   <int>   <int>
    #1:   469       0       0       1       1       1
    #2:   541       1       1       0       0       0
    #3:   721       0       1       1       0       0

The 1 would indicate the individual was reported using the drug while the 0 would indicate they have not.

ADD REPLY
1
Entering edit mode
3 months ago
zx8754 12k

We need to reshape twice:

  1. wide-to-long, removing blanks and $5
  2. long-to-wide, try this example:
    library(data.table)

    d <- fread("id atc1 atc2 atc3
    541 C10AA03 D05BX51 $5
    469 M01AB05 L02AE02 J01CF05
    721 D05BX51 J01CF05 NA")

    dcast(melt(d, id.vars = "id", na.rm = TRUE)[ value != "$5", ], 
          formula = id ~ value,
          fun.aggregate = length, 
          fill = 0)
    # Key: <id>
    #      id C10AA03 D05BX51 J01CF05 L02AE02 M01AB05
    #   <int>   <int>   <int>   <int>   <int>   <int>
    #1:   469       0       0       1       1       1
    #2:   541       1       1       0       0       0
    #3:   721       0       1       1       0       0
ADD COMMENT
0
Entering edit mode

Yes! this is exactly what I would require, however I would need to do it for my much larger dataset eg:

d <- fread('1a_v_1_results.csv',  sep=',' header = TRUE)

Now all the information is contained in the columns,

project_pseudo_id atc_code_adu_c_1_01 ... atc_code_adu_c_1_032
ADD REPLY
0
Entering edit mode

d.s.zimmerman : Please use 101010 button to format sections of text as code so they appear in mono-spaced font.

ADD REPLY

Login before adding your answer.

Traffic: 1810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6