Question

Find similar values in two gene lists

0

Entering edit mode

5.4 years ago

Nicky • 0

Hi guys, I have two very large gene data sets, and I want to extract all the matching values that are in the two lists.

but I haven't been successful until now,

so far this is my code

                  list1 = ("1_10.txt")
                  list2 = ("1_10.txt")

   ID <- match(list1,list2) 
          result1 <- list2[na.omit(ID)]
                   unique(result1)
                      write.csv(ID,file="matchedresults1.txt")




   list1 
        EnsemblGeneID
         ENSG00000109573
         ENSG00000205003
         ENSG00000124603
         ENSG00000008313
         ENSG00000183043
         ENSG00000179863
         ENSG00000141337
         ENSG00000154257

     list2 
        EnsemblGeneID
         ENSG00000109573
         ENSG00000205003
         ENSG00000124603
         ENSG00000008313
         ENSG00000183043
         ENSG00000179863
         ENSG00000141338
         ENSG00000154258

So I expected that see the extracted data: 

     list3 
        EnsemblGeneID
         ENSG00000109573
         ENSG00000205003
         ENSG00000124603
         ENSG00000008313
         ENSG00000183043
         ENSG00000179863

thanks for reading

R coding • 1.5k views

ADD COMMENT • link updated 5.4 years ago by 2nelly ▴ 350 • written 5.4 years ago by Nicky • 0

0

Entering edit mode

intersect()

ADD REPLY • link 5.4 years ago by ATpoint 86k

0

Entering edit mode

Hi there, I have a long list of genes in two files.

Let me try your script

ADD REPLY • link 5.4 years ago by Nicky • 0

0

Entering edit mode

Nicky : Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

ADD REPLY • link 5.4 years ago by GenoMax 148k

score 0 · Answer 1 · 2019-08-13

0

Entering edit mode

5.4 years ago

piyushjo ▴ 710

Try this

list1<-read.delim("1_10.txt")
list2<-read.delim("1_10.txt")

Here both of your lists are exactly same. Do you have long list of genes in two files or is it a matrix? The following will work for a list with one column. First convert factors into characters array

list1<-as.character(list1) 
#If the read.delim makes list1 as a data.frame with 1 column, you would need as.character(list1$X), where X is the heading of column, or x if there was no heading. 
list2<-as.character(list2)    
keep<- list1%in%list2 
sel<- list1[keep] 
sel<-sel[!duplicated(sel)]
write.csv(sel,"matchedresults1.txt")

ADD COMMENT • link 5.4 years ago by piyushjo ▴ 710

0

Entering edit mode

Hi there, I have a long list of genes in two files.

I with your script I got the following result

                                      "","x"
                                      "1","1_10.txt"

no idea what does it mean :(

ADD REPLY • link 5.4 years ago by Nicky • 0

0

Entering edit mode

You first need to have files with those name in the folder. list1 and list2 are the two lists that have given in your example.

ADD REPLY • link 5.4 years ago by piyushjo ▴ 710

score 0 · Answer 2 · 2019-08-13

0

Entering edit mode

5.4 years ago

2nelly ▴ 350

You can try this

awk 'NR==FNR {end[$1]; next} ($1 in end)' list1 list2

In case you want to match different columns change the first $1 with the number of column you want to compare or the second $1 to match with another column

ADD COMMENT • link 5.4 years ago by 2nelly ▴ 350