Question

How do i create and populates new sample columns based on values (samples barcodes) from an existing column

1

Entering edit mode

4.1 years ago

andresllucena ▴ 40

I have a data frame as follows:

miRNA_region  barcode                      read_count
MIMAT0000062  TCGA-05-4244-01A-01T-1108-13      14492
MIMAT0000063  TCGA-05-4244-01A-01T-1108-13       8767
MIMAT0000064  TCGA-05-4244-01A-01T-1108-13        610
MIMAT0000065  TCGA-05-4244-01A-01T-1108-13        750
MIMAT0000066  TCGA-05-4244-01A-01T-1108-13        804
MIMAT0000067  TCGA-05-4244-01A-01T-1108-13       4748
MIMAT0000062  TCGA-05-4384-01A-01T-1754-13     505712
MIMAT0000063  TCGA-05-4384-01A-01T-1754-13     121127
MIMAT0000064  TCGA-05-4384-01A-01T-1754-13      12833
MIMAT0000065  TCGA-05-4384-01A-01T-1754-13       1455
MIMAT0000067  TCGA-05-4384-01A-01T-1754-13      15284

Barcode corresponds to different samples. I need to convert the values in ''barcode'' column into new columns and get something like:

miRNA_region    TCGA-05-4244-01A-01T-1108-13    TCGA-05-4384-01A-01T-1754-13
MIMAT0000062    14492                           505712
MIMAT0000063    8767                            121127
MIMAT0000064    610                             12833
MIMAT0000065    750                             1455
MIMAT0000066    804                             15284

R TCGA • 1.2k views

ADD COMMENT • link 4.1 years ago by andresllucena ▴ 40

score 1 · Answer 1 · 2021-01-19

There are values in your example output that don't appear in your example data, but I'll assume you just want to convert your data from long to wide format.

example data

df <- structure(list(miRNA_region = c("MIMAT0000062", "MIMAT0000063", 
"MIMAT0000062", "MIMAT0000063"), barcode = c("TCGA-05-4244-01A-01T-1108-13", 
"TCGA-05-4244-01A-01T-1108-13", "TCGA-05-4384-01A-01T-1754-13", 
"TCGA-05-4384-01A-01T-1754-13"), read_count = c(14492, 8767, 
505712, 121127)), class = "data.frame", row.names = c(NA, -4L
))

> df
  miRNA_region                      barcode read_count
1 MIMAT0000062 TCGA-05-4244-01A-01T-1108-13      14492
2 MIMAT0000063 TCGA-05-4244-01A-01T-1108-13       8767
3 MIMAT0000062 TCGA-05-4384-01A-01T-1754-13     505712
4 MIMAT0000063 TCGA-05-4384-01A-01T-1754-13     121127

pivot_wider from tidyr (part of the tidyverse). See vignette("pivot") for more information.

library("tidyr")

df_wide <- pivot_wider(df, names_from=barcode, values_from=read_count)

> df_wide
# A tibble: 2 x 3
  miRNA_region `TCGA-05-4244-01A-01T-1108-13` `TCGA-05-4384-01A-01T-1754-13`
  <chr>                                 <dbl>                          <dbl>
1 MIMAT0000062                          14492                         505712
2 MIMAT0000063                           8767                         121127

If you have a lot of data it's quicker and more memory efficient to do this with dcast from data.table. See vignette("datatable-reshape") for more information.

library("data.table")

setDT(df)
df_wide <- dcast(df, miRNA_region ~ barcode, value.var="read_count")

> df_wide
   miRNA_region TCGA-05-4244-01A-01T-1108-13 TCGA-05-4384-01A-01T-1754-13
1: MIMAT0000062                        14492                       505712
2: MIMAT0000063                         8767                       121127