how to merge multiple files based on column header
3
0
Entering edit mode
12 months ago
Bioinfonext ▴ 470

Hi, I would like to merge three files based on column header Sample_ID and file should only have those samples that are common in all these three file. Could you please help with this.

in.sites <- read.table("CpG.csv", header=T, sep=",", as.is=T, na.strings="NA")
    in.sites[c(1:3), c(1:3)]
        Sample_ID Mean_M_from_TRUEinvar_per_person n_TRUEinvar_CpGs_in_split
    1 NSS.1.0093                        -3.857742                     77095
    2 NSS.1.0095                        -3.761795                     77095
    3 NSS.1.0096                        -3.715694                     77095
     pheno <- read.table("Phe_121023.csv", header=T, sep=",", as.is=T, na.strings="NA")
    pheno[c(1:3), c(1:3)]
        Sample_ID         BeacChip.ID   Sentrix_ID
    1 NSS.1.0093 200772280026_R05C01 200772280026
    2 NSS.1.0095 200772280026_R07C01 200772280026
    3 NSS.1.0096 200772280026_R08C01 200772280026
     PCs <- read.table("Control_probe_PCs_all_preprocessed.txt", header=T, sep="\t", as.is=T, na.strings="NA")
    PCs[c(1:3), c(1:3)]
        Sample_ID       PC1      PC2
    1 NSS.1.0093 -25382.95 22243.17
    2 NSS.1.0095 -29640.00 27610.33
    3 NSS.1.0096 -41261.36 30188.37

Many thanks

R statistics biostatistics • 817 views
ADD COMMENT
1
Entering edit mode
12 months ago
bk11 ★ 3.0k

A simple way to do will be like this-

tmp=merge(in.sites, pheno, by="Sample_ID")
final=merge(tmp, PCs, by="Sample_ID"

write.table(final, file="your_merged_commonData.txt", sep="\t")
ADD COMMENT
1
Entering edit mode
12 months ago
zau saa ▴ 150
library(dplyr)
tmp <- inner_join(in.sites, pheno, by = "Sample_ID")
merged <- inner_join(tmp, PCs, by = "Sample_ID")
ADD COMMENT
0
Entering edit mode

my answer is ok now. I misunderstand the question before.

ADD REPLY
0
Entering edit mode
12 months ago
iraun 6.2k

Not tested, but maybe something like this might do the job.

list_of_frames <- list(df1, df2, df3)

# Extract common Sample_IDs
common_samples <- Reduce(intersect, lapply(list_of_frames, function(df) df$Sample_ID))

# Keep only rows with common Sample_ID
filtered_frames <- lapply(list_of_frames, function(df) df[df$Sample_ID %in% common_samples, ])

# Merge the filtered dataframes based on Sample_ID
merged_data <- Reduce(function(x, y) merge(x, y, by = "Sample_ID"), filtered_frames)
ADD COMMENT
0
Entering edit mode

what about function df?

ADD REPLY
0
Entering edit mode

What about it? It is just an anonymous function (similar to lambda in python), df is the name of the input argument, could be x or whichever other name you prefer. It is of course possible to extract the function, and then apply it with lapply. E.g:

extract_sample_id <- function(df) df$Sample_ID
common_samples <- Reduce(intersect, lapply(list_of_frames, extract_sample_id))
ADD REPLY

Login before adding your answer.

Traffic: 2016 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6