Question

Loop for removing the complete missing rows in a n number of groups of a dataframe in R

1

Entering edit mode

2.9 years ago

KABILAN ▴ 130

I have to make a loop function in R which can be used for removing the complete missing rows in all of the different groups.

  #Grouping of dataframe as a triplicate groups
  grouping_data<-function(df){                      #df= dataframe
    df_col<-ncol(df)                                #calculates no. of columns in dataframe
    groups<-sort(rep(0:((df_col/3)-1),3))           #creates user determined groups
    id<-list()                                      #creates empty list
    for (i in 1:length(unique(groups))){
      id[[i]]<-which(groups == unique(groups)[i])}  #creates list of groups
    names(id)<-paste0("id",unique(groups))          #assigns group based names to the list "id"
    data<-list()                                    #creates empty list
    for (i in 1:length(id)){
      data[[i]]<-df[,id[[i]]]}                      #creates list of dataframe columns sorted by groups
    names(data)<-paste0("data",unique(groups))      #assigns group based names to the list "data"
    return(data)}

  new<-grouping_data(data_input)

By the above code, I have created a list of 'n' number of groups which are all having 3 columns.

My next step is I have to remove the complete missing rows (it may contain 1 or 2 missing values per row among the 3 columns) in all the 'n' number of groups of dataset. Then I have to combine all the new datagroups by using the logical function of 'AND' for easy combining of all the different datagroups as a single dataframe at last. And all the datagroups will have same row length.

test_data<- function (x){
    data_file <- x %>% 
      dplyr::filter(
        # First group
        !dplyr::if_all(.cols = c(1, 2, 3), .fns = is.na), # removing rows if all columns 1, 2 and 3 are NA
        # second group
        !if_all(.cols = c(4, 5, 6), .fns = is.na) # removing rows if all columns 1, 2 and 3 are NA
      )
  }

  data_new <- test_data(data_input)

The above code I have tried for 6 column dataframe (2 groups).

But since I am working for a mass spectrometry proteomics expression datasets, the number of groups value may be differ according to the datasets. The important thing is that all datagroup will have only 3 columns. I have attached the image of example dataset.

Example dataset

So kindly provide any R loop code which can be useful for this problem and can be used for any number of column datasets.

data-frame proteomics_data loop missing_value R • 3.9k views

ADD COMMENT • link 2.9 years ago by KABILAN ▴ 130

2

Entering edit mode

Can you provide a small reproducible example dataset, and an example of what you want the output to look like? You can share the data by using the dput function on the dataframes and copy/pasting the code here.

ADD REPLY • link 2.9 years ago by rpolicastro 13k

1

Entering edit mode

Thank you sir, for your interest.

My example data set is,

    structure(list(`1_3ng` = c(69648445400, 73518145600, NA, NA, 
73529102400, 75481088000, NA, 73545910600, 74473949200, 77396199900
), `2_3ng` = c(71187990600, 70677690400, NA, 73675407400, 73215342700, 
NA, NA, 69996254800, 69795686400, 76951318300), `3_3ng` = c(65032022000, 
71248214000, NA, 72393058300, 72025550900, 71041067000, 73604692000, 
NA, 73324202000, 75969608700), `4_7-5ng` = c(NA, 65845061600, 
75009245100, 64021237700, 66960666600, 69055643600, NA, 64899540900, 
NA, NA), `5_7-5ng` = c(65097201700, NA, NA, 69032126500, NA, 
70189899800, NA, 74143529100, 69299087400, NA), `6_7-5ng` = c(71964413900, 
69048485800, NA, 71281569700, 71167596500, NA, NA, 68389822800, 
69322289200, NA), `7_10ng` = c(71420403700, 67552276500, 72888076300, 
66491357100, NA, 68165019600, 70876631000, NA, 69174190100, 63782945300
), `8_10ng` = c(NA, 71179401200, 68959365100, 70570182700, 73032738800, 
NA, 74807496700, NA, 71812102100, 73855098500), `9_10ng` = c(NA, 
70403756100, NA, 70277421000, 69887731700, 69818871800, NA, 71353886700, 
NA, 74115466700), `10_15ng` = c(NA, NA, 68487581700, NA, NA, 
69056997400, NA, 67780479400, 66804467800, 72291939500), `11_15ng` = c(NA, 
63599643700, NA, NA, 60752029700, NA, NA, 63403655600, NA, 64548492900
), `12_15ng` = c(NA, 67344750600, 61610182700, 67414425600, 65946654700, 
66166118400, NA, 70830837700, 67288305700, 69911451300)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -10L))

And I want to get the result file like,

 structure(list(`1_3ng` = c(73518145600, NA, 73529102400, 75481088000, 
73545910600, 74473949200), `2_3ng` = c(70677690400, 73675407400, 
73215342700, NA, 69996254800, 69795686400), `3_3ng` = c(71248214000, 
72393058300, 72025550900, 71041067000, NA, 73324202000), `4_7-5ng` = c(65845061600, 
64021237700, 66960666600, 69055643600, 64899540900, NA), `5_7-5ng` = c(NA, 
69032126500, NA, 70189899800, 74143529100, 69299087400), `6_7-5ng` = c(69048485800, 
71281569700, 71167596500, NA, 68389822800, 69322289200), `7_10ng` = c(67552276500, 
66491357100, NA, 68165019600, NA, 69174190100), `8_10ng` = c(71179401200, 
70570182700, 73032738800, NA, NA, 71812102100), `9_10ng` = c(70403756100, 
70277421000, 69887731700, 69818871800, 71353886700, NA), `10_15ng` = c(NA, 
NA, NA, 69056997400, 67780479400, 66804467800), `11_15ng` = c(63599643700, 
NA, 60752029700, NA, 63403655600, NA), `12_15ng` = c(67344750600, 
67414425600, 65946654700, 66166118400, 70830837700, 67288305700
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-6L))

I am very much eager for your answer.

ADD REPLY • link 2.9 years ago by KABILAN ▴ 130

score 3 · Accepted Answer · 2022-07-11

3

Entering edit mode

2.9 years ago

rpolicastro 13k

Your sample data.

df <- structure(list(`1_3ng` = c(69648445400, 73518145600, NA, NA, 
73529102400, 75481088000, NA, 73545910600, 74473949200, 77396199900
), `2_3ng` = c(71187990600, 70677690400, NA, 73675407400, 73215342700, 
NA, NA, 69996254800, 69795686400, 76951318300), `3_3ng` = c(65032022000, 
71248214000, NA, 72393058300, 72025550900, 71041067000, 73604692000, 
NA, 73324202000, 75969608700), `4_7-5ng` = c(NA, 65845061600, 
75009245100, 64021237700, 66960666600, 69055643600, NA, 64899540900, 
NA, NA), `5_7-5ng` = c(65097201700, NA, NA, 69032126500, NA, 
70189899800, NA, 74143529100, 69299087400, NA), `6_7-5ng` = c(71964413900, 
69048485800, NA, 71281569700, 71167596500, NA, NA, 68389822800, 
69322289200, NA), `7_10ng` = c(71420403700, 67552276500, 72888076300, 
66491357100, NA, 68165019600, 70876631000, NA, 69174190100, 63782945300
), `8_10ng` = c(NA, 71179401200, 68959365100, 70570182700, 73032738800, 
NA, 74807496700, NA, 71812102100, 73855098500), `9_10ng` = c(NA, 
70403756100, NA, 70277421000, 69887731700, 69818871800, NA, 71353886700, 
NA, 74115466700), `10_15ng` = c(NA, NA, 68487581700, NA, NA, 
69056997400, NA, 67780479400, 66804467800, 72291939500), `11_15ng` = c(NA, 
63599643700, NA, NA, 60752029700, NA, NA, 63403655600, NA, 64548492900
), `12_15ng` = c(NA, 67344750600, 61610182700, 67414425600, 65946654700, 
66166118400, NA, 70830837700, 67288305700, 69911451300)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -10L))

Tidyverse answer that works for an arbitrary number of groups and group sizes. This assumes that the column name 1_3ng (as an example) corresponds to something like sample id of 1 and mass of 3ng with an underscore _ as a delimiter.

library("tidyverse")

df |>
  rowid_to_column() |>
  pivot_longer(!rowid, names_to=c("sample_id", "mass"), names_sep="_") |>
  group_by(rowid, mass) |>
  filter(!all(is.na(value))) |>
  ungroup() |>
  add_count(rowid) |>
  filter(n == max(n)) |>
  pivot_wider(names_from=c(sample_id, mass), names_sep="_", values_from=value) |>
  select(!c(rowid, n))

result

# A tibble: 6 × 12
      `1_3ng`   `2_3ng`  `3_3ng` `4_7-5ng` `5_7-5ng` `6_7-5ng` `7_10ng` `8_10ng`
        <dbl>     <dbl>    <dbl>     <dbl>     <dbl>     <dbl>    <dbl>    <dbl>
1 73518145600   7.07e10  7.12e10   6.58e10  NA         6.90e10  6.76e10  7.12e10
2          NA   7.37e10  7.24e10   6.40e10   6.90e10   7.13e10  6.65e10  7.06e10
3 73529102400   7.32e10  7.20e10   6.70e10  NA         7.12e10 NA        7.30e10
4 75481088000  NA        7.10e10   6.91e10   7.02e10  NA        6.82e10 NA      
5 73545910600   7.00e10 NA         6.49e10   7.41e10   6.84e10 NA       NA      
6 74473949200   6.98e10  7.33e10  NA         6.93e10   6.93e10  6.92e10  7.18e10
# … with 4 more variables: 9_10ng <dbl>, 10_15ng <dbl>, 11_15ng <dbl>,
#   12_15ng <dbl>

ADD COMMENT • link 2.9 years ago by rpolicastro 13k

1

Entering edit mode

Thank you so much rpolicastro sir. Your code is working amazingly.

ADD REPLY • link 2.9 years ago by KABILAN ▴ 130

0

Entering edit mode

Sir, Thanks for your code. It is working very nicely. But I have another request from your side. I am developing R package for my M.Sc research work. There the user give the 'metadata' about the working data file information.

      structure(list(Sample = "Group", `1_3ng` = "A", `2_3ng` = "A", 
    `3_3ng` = "A", `4_7-5ng` = "B", `5_7-5ng` = "B", `6_7-5ng` = "B", 
    `7_10ng` = "C", `8_10ng` = "C", `9_10ng` = "C", `10_15ng` = "D", 
    `11_15ng` = "D", `12_15ng` = "D"), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -1L))

From this information, I have to group the working data.

How to modify your above code according to this metadata information and based on that remove the complete missing rows groupwise?

ADD REPLY • link 2.9 years ago by KABILAN ▴ 130

2

Entering edit mode

You need to pivot your group data to long first.

groups <- groups |>
  select(!Sample) |>
  pivot_longer(everything(), values_to="group")

Then it's just a few modifications to the previous code.

df |>
  rowid_to_column() |>
  pivot_longer(!rowid, values_to="mass") |>
  inner_join(groups, by="name") |>
  group_by(rowid, group) |>
  filter(!all(is.na(mass))) |>
  ungroup() |>
  add_count(rowid) |>
  filter(n == max(n)) |>
  select(!c(group, n)) |>
  pivot_wider(names_from=name, values_from=mass) |>
  select(!rowid)

ADD REPLY • link 2.9 years ago by rpolicastro 13k

0

Entering edit mode

Thank you so much rpolicastro sir.

ADD REPLY • link 2.9 years ago by KABILAN ▴ 130

0

Entering edit mode

Sir, I have a dataframe which contains many triplicate (3 columns set). And I have grouped the dataframe into each triplicate as a seperate group of list.

The example dataset is,

example_data <- structure(list(`1_3ng` = c(69648445400, 73518145600, NA, NA, 
73529102400, 75481088000, NA, 73545910600, 74473949200, 77396199900
), `2_3ng` = c(71187990600, 70677690400, NA, 73675407400, 73215342700, 
NA, NA, 69996254800, 69795686400, 76951318300), `3_3ng` = c(65032022000, 
71248214000, NA, 72393058300, 72025550900, 71041067000, 73604692000, 
NA, 73324202000, 75969608700), `4_7-5ng` = c(NA, 65845061600, 
75009245100, 64021237700, 66960666600, 69055643600, NA, 64899540900, 
NA, NA), `5_7-5ng` = c(65097201700, NA, NA, 69032126500, NA, 
70189899800, NA, 74143529100, 69299087400, NA), `6_7-5ng` = c(71964413900, 
69048485800, NA, 71281569700, 71167596500, NA, NA, 68389822800, 
69322289200, NA), `7_10ng` = c(71420403700, 67552276500, 72888076300, 
66491357100, NA, 68165019600, 70876631000, NA, 69174190100, 63782945300
), `8_10ng` = c(NA, 71179401200, 68959365100, 70570182700, 73032738800, 
NA, 74807496700, NA, 71812102100, 73855098500), `9_10ng` = c(NA, 
70403756100, NA, 70277421000, 69887731700, 69818871800, NA, 71353886700, 
NA, 74115466700), `10_15ng` = c(NA, NA, 68487581700, NA, NA, 
69056997400, NA, 67780479400, 66804467800, 72291939500), `11_15ng` = c(NA, 
63599643700, NA, NA, 60752029700, NA, NA, 63403655600, NA, 64548492900
), `12_15ng` = c(NA, 67344750600, 61610182700, 67414425600, 65946654700, 
66166118400, NA, 70830837700, 67288305700, 69911451300)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -10L)

And after grouping I got the four lists, since the above example dataset contains 4 groups. I have used the following R code for grouping the data,

grouping_data<-function(df){                    #df= dataframe
df_col<-ncol(df)                                #calculates no. of columns in dataframe
groups<-sort(rep(0:((df_col/3)-1),3))           #creates user determined groups
id<-list()                                      #creates empty list
for (i in 1:length(unique(groups))){
  id[[i]]<-which(groups == unique(groups)[i])}  #creates list of groups
names(id)<-paste0("id",unique(groups))          #assigns group based names to the list "id"
data<-list()                                    #creates empty list
for (i in 1:length(id)){
  data[[i]]<-df[,id[[i]]]}                      #creates list of dataframe columns sorted by groups
names(data)<-paste0("data",unique(groups))      #assigns group based names to the list "data"
return(data)}
group_data <-grouping_data(example_data)

Please suggest useful R code sir, for do a particular function for all the lists at a same time.

For example the below function I have done by following way,

     #VSN Normalization
      vsnNorm <- function(dat) {
        dat<-as.data.frame(dat)
        vsnNormed <- suppressMessages(vsn::justvsn(as.matrix(dat)))
        colnames(vsnNormed) <- colnames(dat)
        row.names(vsnNormed) <- rownames(dat)
        return(as.matrix(vsnNormed))
      }

And I have tried like below,

      vsn.dat0 <- vsnNorm(group_data$data0)
      vsn.dat1 <- vsnNorm(group_data$data1)
      vsn.dat2 <- vsnNorm(group_data$data2)
      vsn.dat3 <- vsnNorm(group_data$data3)
      vsn.dat <- cbind (vsn.dat0,vsn.dat1,vsn.dat2,vsn.dat3)

It is working well sir. But the dataset triplicate (3 columns set) value may be change from dataset to dataset. And calling all the lists everytime become will be tedious.

So kindly share some codes which will call all the resulted lists for performing a function and combine the result as a single file.

Thank you sir.

ADD REPLY • link 2.9 years ago by KABILAN ▴ 130

0

Entering edit mode

Sir, finally I got answer for the above problem.

vsn.dat <- do.call("cbind", lapply(group_data, vsnNorm))

ADD REPLY • link 2.9 years ago by KABILAN ▴ 130

0

Entering edit mode

Sir, I have to divide my dataframe based on the group conditions. And then I have to extract the top 3 minimum values from particular columns.

Suppose for two groups of data I will get the result like,

structure(list(Type = c("knn_vsn", "knn_vsn", "knn_loess", "knn_loess", 
"knn_rlr", "knn_rlr", "lls_vsn", "lls_vsn", "lls_loess", "lls_loess", 
"lls_rlr", "lls_rlr", "svd_vsn", "svd_vsn", "svd_loess", "svd_loess", 
"svd_rlr", "svd_rlr"), PCV = c(0.00510741446572374, 0.00705765780896556, 
0.00509233659481246, 0.00696732302441824, 0.00509225712407119, 
0.00696173227550932, 0.00492983133396127, 0.00669466376079551, 
0.00491874477556813, 0.0066283342182998, 0.00493450413250135, 
0.00663684901164831, 0.00731828997356189, 0.0106867134410024, 
0.00729635842702563, 0.0105680795904369, 0.00730343601772899, 
0.0105334181341163)), class = "data.frame", row.names = c(NA, 
-18L))

Then I will divide the dataframe by using the following code,

row_odd <- seq_len(nrow(total_pcv))%%2
      data_row_odd <- total_pcv[row_odd == 1, ]
      data_row_even <- total_pcv[row_odd == 0, ]
      total_pcv_all <- cbind (data_row_odd, data_row_even)
      colnames(total_pcv_all) <- c("Group1", "PCV", "Group2", "PCV")
      rownames(total_pcv_all) <- NULL

#Extracting top 3 minimum values in particular column

total_pcv_all <- total_pcv_all%>%slice_min(PCV1, n=3)%>%slice_min(PCV2, n=3)%>%slice_min(PCV3, n=3)%>%slice_min(PCV4, n=3)

And the output will be like,

structure(list(Group1 = c("lls_loess", "lls_rlr", "lls_vsn"), 
    PCV1 = c(0.00491874477556813, 0.00493450413250135, 0.00492983133396127
    ), Group2 = c("lls_loess", "lls_rlr", "lls_vsn"), PCV2 = c(0.0066283342182998, 
    0.00663684901164831, 0.00669466376079551)), class = "data.frame", row.names = c(NA, 
-3L))

This is for two groups of data.

Suppose if the group value of the data will be increase more than two. How to modify the above codes or any other useful way is available for this problem.

For example I have attached the four groups dataframe below,

structure(list(Type = c("knn_vsn", "knn_vsn", "knn_vsn", "knn_vsn", 
"knn_loess", "knn_loess", "knn_loess", "knn_loess", "knn_rlr", 
"knn_rlr", "knn_rlr", "knn_rlr", "lls_vsn", "lls_vsn", "lls_vsn", 
"lls_vsn", "lls_loess", "lls_loess", "lls_loess", "lls_loess", 
"lls_rlr", "lls_rlr", "lls_rlr", "lls_rlr", "svd_vsn", "svd_vsn", 
"svd_vsn", "svd_vsn", "svd_loess", "svd_loess", "svd_loess", 
"svd_loess", "svd_rlr", "svd_rlr", "svd_rlr", "svd_rlr"), PCV = c(0.00318368971435714, 
0.0056588221783197, 0.00418838138878096, 0.0039811913527127, 
0.00317086486813191, 0.00560933517836751, 0.00417201215938804, 
0.00394649435912413, 0.00317086486813191, 0.00560933517836751, 
0.00417201215938804, 0.00394649435912413, 0.00312821095645019, 
0.00550114679857588, 0.00398819978362592, 0.00397059873107098, 
0.00311632537571597, 0.00548316209864631, 0.00397093259462351, 
0.00393840233766712, 0.00313568333628438, 0.00550230673346083, 
0.00398827962107259, 0.00396385071387178, 0.00394831935666465, 
0.00737865310351839, 0.00424157479553304, 0.0041077267588457, 
0.00393605637633005, 0.0073411154394253, 0.00422638750183658, 
0.00407577176849463, 0.00395599132474446, 0.00735748595511963, 
0.00424175886713471, 0.00410191492380459)), class = "data.frame", row.names = c(NA, 
-36L))

And for this I can modify the above code like,

row_odd <- seq_len(nrow(total_pcv))%%4
  data_row_odd0 <- total_pcv[row_odd == 1, ]
  data_row_odd1 <- total_pcv[row_odd == 3, ]
  data_row_even0 <- total_pcv[row_odd == 2, ]
  data_row_even1 <- total_pcv[row_odd == 0, ]

  total_pcv_all <- cbind (data_row_odd0, data_row_odd1, data_row_even0,data_row_even1)
  colnames(total_pcv_all) <- c("Group1", "PCV1", "Group2", "PCV2", "Group3", "PCV3", "Group4", "PCV4")
  rownames(total_pcv_all) <- NULL 
  total_pcv_all <- total_pcv_all%>%slice_min(PCV1, n=3)%>%slice_min(PCV2, n=3)%>%slice_min(PCV3, n=3)%>%slice_min(PCV4, n=3)

And the output will be like,

structure(list(Group1 = c("lls_loess", "lls_rlr", "lls_vsn"), 
    PCV1 = c(0.00311632537571597, 0.00313568333628438, 0.00312821095645019
    ), Group2 = c("lls_loess", "lls_rlr", "lls_vsn"), PCV2 = c(0.00548316209864631, 
    0.00550230673346083, 0.00550114679857588), Group3 = c("lls_loess", 
    "lls_rlr", "lls_vsn"), PCV3 = c(0.00397093259462351, 0.00398827962107259, 
    0.00398819978362592), Group4 = c("lls_loess", "lls_rlr", 
    "lls_vsn"), PCV4 = c(0.00393840233766712, 0.00396385071387178, 
    0.00397059873107098)), class = "data.frame", row.names = c(NA, 
-3L))

Kindly suggest some code to automate this operation based on 'n' number of group data sir. Thank you in advance sir.

ADD REPLY • link 2.9 years ago by KABILAN ▴ 130

0

Entering edit mode

rpolicastro sir, few days before I was asked "Loop for removing the complete missing rows in a n number of groups of a dataframe in R" for the following sample data

df <- structure(list(`1_3ng` = c(69648445400, 73518145600, NA, NA, 
73529102400, 75481088000, NA, 73545910600, 74473949200, 77396199900
), `2_3ng` = c(71187990600, 70677690400, NA, 73675407400, 73215342700, 
NA, NA, 69996254800, 69795686400, 76951318300), `3_3ng` = c(65032022000, 
71248214000, NA, 72393058300, 72025550900, 71041067000, 73604692000, 
NA, 73324202000, 75969608700), `4_7-5ng` = c(NA, 65845061600, 
75009245100, 64021237700, 66960666600, 69055643600, NA, 64899540900, 
NA, NA), `5_7-5ng` = c(65097201700, NA, NA, 69032126500, NA, 
70189899800, NA, 74143529100, 69299087400, NA), `6_7-5ng` = c(71964413900, 
69048485800, NA, 71281569700, 71167596500, NA, NA, 68389822800, 
69322289200, NA), `7_10ng` = c(71420403700, 67552276500, 72888076300, 
66491357100, NA, 68165019600, 70876631000, NA, 69174190100, 63782945300
), `8_10ng` = c(NA, 71179401200, 68959365100, 70570182700, 73032738800, 
NA, 74807496700, NA, 71812102100, 73855098500), `9_10ng` = c(NA, 
70403756100, NA, 70277421000, 69887731700, 69818871800, NA, 71353886700, 
NA, 74115466700), `10_15ng` = c(NA, NA, 68487581700, NA, NA, 
69056997400, NA, 67780479400, 66804467800, 72291939500), `11_15ng` = c(NA, 
63599643700, NA, NA, 60752029700, NA, NA, 63403655600, NA, 64548492900
), `12_15ng` = c(NA, 67344750600, 61610182700, 67414425600, 65946654700, 
66166118400, NA, 70830837700, 67288305700, 69911451300)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -10L))

And the sample metadata(group data) is,

structure(list(Sample = "Group", `1_3ng` = "A", `2_3ng` = "A", 
    `3_3ng` = "A", `4_7-5ng` = "B", `5_7-5ng` = "B", `6_7-5ng` = "B", 
    `7_10ng` = "C", `8_10ng` = "C", `9_10ng` = "C", `10_15ng` = "D", 
    `11_15ng` = "D", `12_15ng` = "D"), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -1L))

And you were suggested the below code sir,

groups <- groups |>
  select(!Sample) |>
  pivot_longer(everything(), values_to="group")

df |>
  rowid_to_column() |>
  pivot_longer(!rowid, values_to="mass") |>
  inner_join(groups, by="name") |>
  group_by(rowid, group) |>
  filter(!all(is.na(mass))) |>
  ungroup() |>
  add_count(rowid) |>
  filter(n == max(n)) |>
  select(!c(group, n)) |>
  pivot_wider(names_from=name, values_from=mass) |>
  select(!rowid)

It was working perfectly well sir. But if I add extra one column which contains the "Uniprot_IDs " information into the sample data like below,

df1<-structure(list(Uniprot_IDs = c("P0A6Y8|DNAK", "P0A853|TNAA", 
"P0CE47|EFTU1", "P0A6F3|GLPK", "P0A6F5|CH60", "P0A9B2|G3P1", 
"P69908|DCEA", "P02925|RBSB", "P0A6P1|EFTS", "P0A799|PGK"), `1_3ng` = c(12305960196.5721, 
24169710612.0476, NA, 8553811608.70032, 13176265141.6301, 92994780469.5607, 
11373139178.993, NA, 8062061247.94512, 3484150815.20598), `2_3ng` = c(11629654800, 
25162283400, 31864546300, 8157173240, 12812379370, 90007498700, 
10191440110, NA, 7911370530, 3406054010), `3_3ng` = c(12503938417.8663, 
25733015601.0117, 34727094361.2997, 8857104380.18179, NA, 93988723611.341, 
11653192532.4546, NA, 7933102839.01341, NA), `4_7-5ng` = c(NA, 
79582218995.1549, 77615759060.3497, 21749287984.8341, 33342436650.5148, 
101254055758.836, 30624750667.6451, 39438567251.7351, 10726988796.4798, 
7850501475.22747), `5_7-5ng` = c(NA, 78743355495.2545, 81948536416.9992, 
NA, 34617564902.3219, 99485017820.8478, NA, 40420212151.9563, 
14804870783.7792, 8280398872.03417), `6_7-5ng` = c(NA, 80272416055.8845, 
77019098847.8474, 23045479130.9574, 32885483296.8046, 90789109337.1181, 
30678346321.0037, 37073444001.0421, 13710097518.7425, 7916821420.64152
), `7_10ng` = c(22617928037.5148, 97473230025.8853, 91579176089.4265, 
28086665669.9634, 38033883000.8102, NA, 37181868033.5073, 44274304023.6936, 
NA, 9288965106.5049), `8_10ng` = c(22091136513.3736, NA, 90754802145.7813, 
26405368418.6503, 36442770423.3661, NA, 36789459227.7515, 42793252584.0984, 
15307787846.1716, 8834742124.86943), `9_10ng` = c(24125219176.3177, 
98420360686.1339, 99355131865.2305, 28271975548.9608, 39837381317.8216, 
NA, 39481996086.9157, 47261977623.5276, 16463020175.2068, 9931809132.696
), `10_15ng` = c(30252776887.1842, 141726904178.35, 130889671408.26, 
38206477283.6549, 56021084469.4745, 100336249543.662, 53295491175.4506, 
62883519160.5278, NA, 13994955303.4972), `11_15ng` = c(32859283128.8916, 
161633827056.573, NA, 45497410866.4248, 61586094337.2513, NA, 
60508117975.6097, 73276218943.4545, NA, 15400735421.5), `12_15ng` = c(34372085877.8071, 
165557046117.222, 153975644961.53, 46279635074.4959, 61867667358.3367, 
106133922907.254, 63526552497.161, 76374667334.5682, NA, 15329671283.3959
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-10L))

If I run with this dataset by the above code, I am getting the following error,

Error in `pivot_longer_spec()`:
! Can't combine `Uniprot_IDs` <character> and `1_3ng` <double>.
Run `rlang::last_error()` to see where the error occurred.

So, kindly suggest how to modify the above code for this dataset sir.

ADD REPLY • link 2.9 years ago by KABILAN ▴ 130

1

Entering edit mode

Remove rowid_to_column() and select(!rowid), and replace anywhere is says rowid with `Fasta headers`.

Just a general word of advice, this isn't meant to be a code writing service. Any code you're given here you should step through line-by-line so you know what it's doing. I also recommend reading R for Data Science to get caught up with the tidyverse.

ADD REPLY • link 2.9 years ago by rpolicastro 13k

0

Entering edit mode

Yes sir. I accept your suggestion and I will read the link you have given also. Thank you sir.

ADD REPLY • link 2.9 years ago by KABILAN ▴ 130