Using awk is definitely the way to go if this is all you need to do, but I figured I would post a little R solution since you asked, and in case you or anyone wanted it.
Example data.
df <- structure(list(X1 = c(34L, 2L, 88L, 90L, 52L), X2 = c(NA, 74L,
80L, 67L, 44L), X3 = c(NA, NA, 65L, 46L, 73L), X4 = c(NA, NA,
NA, 73L, 71L), X5 = c(NA, NA, NA, NA, 50L)), class = "data.frame", row.names = c(NA,
-5L))
> df
X1 X2 X3 X4 X5
1 34 NA NA NA NA
2 2 74 NA NA NA
3 88 80 65 NA NA
4 90 67 46 73 NA
5 52 44 73 71 50
tidyverse solution
library("tidyverse")
library("readxl")
# To read in your excel file.
# df <- read_excel("path/to/file")
df <- df %>%
pivot_longer(everything(), names_to="column", values_to="value") %>%
drop_na %>%
arrange(column)
You should then have a data.frame from wide to long format.
> df
# A tibble: 15 x 2
column val
<chr> <int>
1 X1 34
2 X1 2
3 X1 88
4 X1 90
5 X1 52
6 X2 74
7 X2 80
8 X2 67
9 X2 44
10 X3 65
11 X3 46
12 X3 73
13 X4 73
14 X4 71
15 X5 50
Use
awk
, not R. Save the file as a TSV and work on that. For each column, save that column's header in a variable and for each non blank row in that column, print the header-variable followed by the "cell" content.Try the above yourself and let us know if you have specific questions.