command for common between three files
3
0
Entering edit mode
3.2 years ago
harry ▴ 40

I have three text files and I want to know the difference between the 3 files and the common between 3 files. It looks like this:

1st file:

hsa_circ_0072810
hsa_circ_0072811
hsa_circ_0072813
hsa_circ_0098750
hsa_circ_0125807
hsa_circ_0000295
hsa_circ_0134603
hsa_circ_0001196
hsa_circ_0097585
hsa_circ_0097586
hsa_circ_0006118
hsa_circ_0080950
hsa_circ_0102355
hsa_circ_0000175
hsa_circ_0000934
hsa_circ_0125807

2nd file:

hsa_circ_0072810
hsa_circ_0072811
hsa_circ_0072813
hsa_circ_0098750
hsa_circ_0017672
hsa_circ_0040452
hsa_circ_0098687
hsa_circ_0000400
hsa_circ_0004055
hsa_circ_0006620
hsa_circ_0006118
hsa_circ_0080950
hsa_circ_0102355
hsa_circ_0000175
hsa_circ_0000934
hsa_circ_0125807

3rd file:

hsa_circ_0072810
hsa_circ_0072811
hsa_circ_0072813
hsa_circ_0098750
hsa_circ_0110890
hsa_circ_0001611
hsa_circ_0001675
hsa_circ_0002937
hsa_circ_0004932
hsa_circ_0002393
hsa_circ_0116839
hsa_circ_0072850
hsa_circ_0072848
hsa_circ_0131605
hsa_circ_0001826
hsa_circ_0080696

So can you please tell me how can I extract the common between 3 files and the difference between 3 files?

Thanks in advance

grep • 1.6k views
ADD COMMENT
3
Entering edit mode
3.2 years ago

All combinations among 3 : http://www.interactivenn.net/. Output image and data can be downloaded.

venn

in R, for all combination of comparisons:

library(gplots)
list=lapply(list.files(pattern = "*.txt", full.names = T ), function(x) read.csv (x,header = F))
names(list)=list.files(pattern = "*.txt")
print(venn(list))

ps: https://bioinformatics.psb.ugent.be/webtools/Venn/ -- expired link and not working.

ADD COMMENT
2
Entering edit mode
3.2 years ago
Mensur Dlakic ★ 28k
grep -w -f 1st 2nd | grep -w -f - 3rd > common

You will have to define better what you mean by the difference between 3 files.

ADD COMMENT
2
Entering edit mode

or in R:

Reduce(
  intersect, list(vect1, vect2, vect3)
)
ADD REPLY
2
Entering edit mode
grep -v -f 2nd 1st | grep -v -f 3rd - > 1st_unique
grep -v -f 1st 2nd | grep -v -f 3rd - > 2nd_unique
grep -v -f 1st 3rd | grep -v -f 2nd - > 3rd_unique
ADD REPLY
0
Entering edit mode

Thanks for replying. Difference between means those are not present in any other text file they are unique in one particular file.

ADD REPLY
1
Entering edit mode
3.2 years ago
ponganta ▴ 590

In R, you could use this function to obtain the unique values for each file:


#' Find distinct entries per list element
#' 
#' @param ls A `list` of vectors of the same data type
#' @value A `list` of unique elements of each vector
elements_distinct = function(ls){

  lapply(c(1:length(ls)), function(x,y)

    y[[x]][which(!y[[x]] %in% unique(unlist(y[-x])))],

    y = ls

  )

}

The advantage of this approach would be scalability. It doesn't matter whether you have 3, 9, or 9000 sets to compare.

ADD COMMENT
0
Entering edit mode

Followup: Example usage (in an R-project, which I would highly recommend for any type of data munging!). In this project, get your files into a folder called "data".


# PACKAGES ----------------------
library(magrittr) # for the pipe
library(tools)    # for file_path_sans_ext

# FILES -------------------------
## list all filepaths
myfiles = list.files("data", full.names = TRUE) 

## get filenames (no path, no extension)
mynames = list.files("data")  %>% 
  file_path_sans_ext()

# ANALYSES -----------------------
## load your files into a list. Each list element will be named after the file.
mylist = lapply(myfiles, readLines) %>% setNames(mynames)

## find distinct elements
mysnowflakes = elements_distinct(mylist)
ADD REPLY

Login before adding your answer.

Traffic: 1608 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6