Question

subsetting a data frame that has rows with values in more than one column in R

0

Entering edit mode

2.5 years ago

pramach1 ▴ 40

I have a large data frame that as several rows and columns. I need to subset the data frame to rows that has values in more than column. This is the data frame.

Sample Typhi Kentucky | 8:i:z6
F675BNARV 0 2000(3%)
F685NARV 0 0
F722NARV 2038340 (9.24%) 2882679 (13.07%)

I want to subset the row number 4 (F722NARV), since it has values in more than one column. How do i do that. I have tried various forms of subset and sapply. Any help regarding this is appreciated.

dataframe R subsets • 1.3k views

ADD COMMENT • link updated 2.5 years ago by rpolicastro 13k • written 2.5 years ago by pramach1 ▴ 40

rpolicastro · Accepted Answer · 2022-12-04

3

Entering edit mode

2.5 years ago

rpolicastro 13k

It's unclear how your data.frame is formatted exactly (you can share part of it via dput(head(df))), but generally speaking the code will look something like this.

library("dplyr")

# If you want all columns except the first to not equal 0.
df |>
  rowwise() |>
  filter(if_all(!1, \(x) x != 0)) |>
  ungroup()

# If you just want more than one column (except the first) to not equal 0.
df |>
  rowwise() |>
  filter(sum(c_across(!1) != 0) > 1) |>
  ungroup()

Again, this code may not work for you depending on how your data.frame is actually formatted, so edit the code as needed or include a reproducible example.

ADD COMMENT • link 2.5 years ago by rpolicastro 13k

1

Entering edit mode

# If you just want more than one column to not equal 0.
df |>  rowwise() |> filter(sum(c_across(!1) != 0) > 1) |> ungroup()

Worked for what I was looking for. Thank you.

ADD REPLY • link 2.5 years ago by pramach1 ▴ 40

0

Entering edit mode

Thank you. Will try this. In the meantime here is the dput(head(df))

> dput(head(df2))
structure(list(Sample = c("F675BNARV", "F685NARV", "F715NARV", 
"F717NARV", "F722NARV", "F762NARV"), I.48.z4.z24.1.5...48.z4.z24.1.5 = c("0", 
"0", "0", "0", "2038340 (9.24%)", "0"), Kentucky...8.i.z6 = c("0", 
"0", "0", "0", "2882679 (13.07%)", "0"), Molade.or.Wippra...8.z10.z6 = c("831691 (3.69%)", 
"0", "0", "0", "0", "0"), Montevideo...7.g.m.s.NA = c("0", "530046 (2.21%)", 
"7823859 (39.08%)", "0", "0", "0"), Newport...8.e.h.1.2 = c("0", 
"0", "0", "6689807 (22.29%)", "0", "2864791 (9.66%)")), row.names = c(NA, 
6L), class = "data.frame")

ADD REPLY • link updated 2.5 years ago by rpolicastro 13k • written 2.5 years ago by pramach1 ▴ 40