Question

how to merge two files without duplicating same column

0

Entering edit mode

15 months ago

Bioinfonext ▴ 470

Dear all,

I am using below command to merge two files based on one of the similar column, but this command also duplicate the other common columns;

file1 <- read.table("CpG.csv", header=T, sep=",", as.is=T, na.strings="NA")
    file1[c(1:3), c(1:3)]
        Sample_ID                          PC1                         PC2
    1 NSS.1.0093                        -25382.95                     22243.17
    2 NSS.1.0095                        -29640.00                     27610.33
    3 NSS.1.0096                        -41261.36                     30188.37
     file2 <- read.table("Phe_121023.csv", header=T, sep=",", as.is=T, na.strings="NA")
    file2[c(1:3), c(1:3)]
        Sample_ID         BeacChip.ID   Sentrix_ID
    1 NSS.1.0093 200772280026_R05C01 200772280026
    2 NSS.1.0095 200772280026_R07C01 200772280026
    3 NSS.1.0096 200772280026_R08C01 200772280026
     PCs <- read.table("Control_probe_PCs_all_preprocessed.txt", header=T, sep="\t", as.is=T, na.strings="NA")
    PCs[c(1:3), c(1:3)]
        Sample_ID       PC1      PC2
    1 NSS.1.0093 -25382.95 22243.17
    2 NSS.1.0095 -29640.00 27610.33
    3 NSS.1.0096 -41261.36 30188.37

tmp=merge(file1, file2, by="Sample_ID")

write.table(tmp, file="your_merged_commonData.txt", sep="\t")

R statistics biostatistics • 728 views

ADD COMMENT • link updated 15 months ago by DBScan ▴ 480 • written 15 months ago by Bioinfonext ▴ 470

1

Entering edit mode

15 months ago

DBScan ▴ 480

You can use a join function from dplyr package like this:

inner_join(file1, PCs)

This would keep only sample which occur in both of your dataframes.

ADD COMMENT • link 15 months ago by DBScan ▴ 480

score 2 · Accepted Answer · 2024-01-15

Yes, indeed, when both data.frames contain the same column (or a similarly named one) then the suffix .x and .y will be added to the column name.

It is not up to R to guess which columns you would like to keep. Either you exclude them already prior to merging or you drop them later. This for example will drop the columns with .y and rename those .x:

tmp <- tmp[,!grepl("\\.y$",colnames(tmp))]
colnames(tmp) <- gsub("\\.x$","",colnames(tmp))