Question

Combining Overlapping ATC Code Usage in R

0

Entering edit mode

10 months ago

d.s.zimmerman • 0

Hello all,

I have a large dataset with 33 column headers. The first header is the unique ID corresponding to and individual and 2 - 33 are different times at which medication use was asked. Each row in column 2 - 32 either contains an ATC code, a blank or $5 for not filled in. There are overlaps and unique ATC codes between 2 - 32 columns. What I would like to do is manipulate this dataset so I can have one column containing the unique ID's, then a column for each ATC code found in the different columns and then a 1 or 0 for each individual to state whether they used the ATC drug or not. Would anyone be able to help me with this?

The original data file looks like this:

id atc1 atc2 atc3

541 C10AA03 D05BX51 $5

469 M01AB05 L02AE02 J01CF05

721 D05BX51 J01CF05 **blank**

The final dataset I am trying to achieve would like this:

 # Key: <id>
#      id C10AA03 D05BX51 J01CF05 L02AE02 M01AB05
#   <int>   <int>   <int>   <int>   <int>   <int>
#1:   469       0       0       1       1       1
#2:   541       1       1       0       0       0
#3:   721       0       1       1       0       0

The 1 would indicate the individual was reported using the drug while the 0 would indicate they have not.

Thanks in advance,
Dominic

R pharmacogenetics • 1.0k views

ADD COMMENT • link 10 months ago by d.s.zimmerman • 0

1

Entering edit mode

Show us a reproducible example of your data: the header and first 3 lines for starters.

ADD REPLY • link 10 months ago by Ram 45k

0

Entering edit mode

Sure thing.

The original data file looks like this:


    id atc1 atc2 atc3
    541 C10AA03 D05BX51 $5
    469 M01AB05 L02AE02 J01CF05
    721 D05BX51 J01CF05 NA

The final dataset I am trying to achieve would like this:

 # Key: <id>
    #      id C10AA03 D05BX51 J01CF05 L02AE02 M01AB05
    #   <int>   <int>   <int>   <int>   <int>   <int>
    #1:   469       0       0       1       1       1
    #2:   541       1       1       0       0       0
    #3:   721       0       1       1       0       0

The 1 would indicate the individual was reported using the drug while the 0 would indicate they have not.

ADD REPLY • link 10 months ago by d.s.zimmerman • 0

Ram · Answer 1 · 2024-08-14

1

Entering edit mode

10 months ago

zx8754 12k

We need to reshape twice:

wide-to-long, removing blanks and $5
long-to-wide, try this example:

    library(data.table)

    d <- fread("id atc1 atc2 atc3
    541 C10AA03 D05BX51 $5
    469 M01AB05 L02AE02 J01CF05
    721 D05BX51 J01CF05 NA")

    dcast(melt(d, id.vars = "id", na.rm = TRUE)[ value != "$5", ], 
          formula = id ~ value,
          fun.aggregate = length, 
          fill = 0)
    # Key: <id>
    #      id C10AA03 D05BX51 J01CF05 L02AE02 M01AB05
    #   <int>   <int>   <int>   <int>   <int>   <int>
    #1:   469       0       0       1       1       1
    #2:   541       1       1       0       0       0
    #3:   721       0       1       1       0       0

ADD COMMENT • link updated 10 months ago by Ram 45k • written 10 months ago by zx8754 12k

0

Entering edit mode

Yes! this is exactly what I would require, however I would need to do it for my much larger dataset eg:

d <- fread('1a_v_1_results.csv',  sep=',' header = TRUE)

Now all the information is contained in the columns,

project_pseudo_id atc_code_adu_c_1_01 ... atc_code_adu_c_1_032

ADD REPLY • link 10 months ago by d.s.zimmerman • 0

0

Entering edit mode

d.s.zimmerman : Please use 101010 button to format sections of text as code so they appear in mono-spaced font.

ADD REPLY • link 10 months ago by GenoMax 152k