Question

getting the common genes between two lists of genes

1

Entering edit mode

9.1 years ago

zizigolu ★ 4.3k

Hi,

I have two lists of genes

mycounts <- read.csv("geneID and length.csv", header = T, sep = "\t", stringsAsFactors = FALSE)

> colnames(mycounts)
[1] "genesID.geneslength"

> head(mycounts[1:4,])
[1] "R0010W,1272" "R0020C,1122" "R0030W,546"  "R0040C,891"

> dim(mycounts)
[1] 7130    1

mycounts1 <- read.table("read.txt", header = T, sep = "\t", stringsAsFactors = FALSE)

> dim(mycounts1)
[1] 5961    1
> colnames(mycounts1)
[1] "Freq"

How I can have only genes in my read file in my genes file? I mean genes file has 7130 that I only need 5961 of them

May you help me please? Thank you

gene R • 3.5k views

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.1 years ago by zizigolu ★ 4.3k

2

Entering edit mode

Antonio is right, you are using read.csv on a comma separated file, but then set sep = "\t".

That's just wrong.

ADD REPLY • link updated 5.2 years ago by Ram 44k • written 9.1 years ago by Michael 55k

2

Entering edit mode

9.1 years ago

EagleEye 7.6k

If columns names (for merging) are different in two files

Common <- merge(mycounts, mycounts1, by.x=c("colNameFrom-mycounts"), by.y=c("colNameFrom-mycounts1"))

If columns names (for merging) are same in two files

Common <- merge(mycounts, mycounts1, by="colName")

ADD COMMENT • link updated 5.2 years ago by Ram 44k • written 9.1 years ago by EagleEye 7.6k

Ram · Accepted Answer · 2015-12-30

3

Entering edit mode

9.1 years ago

zx8754 12k

See %in% operator, e.g.:

data1_subset <- data1[ data1$genes %in% data2$genes, ]

Or we can use merge, e.g.:

data1_merge <- merge(data1, data2, by = "gene")

ADD COMMENT • link updated 5.2 years ago by Ram 44k • written 9.1 years ago by zx8754 12k

0

Entering edit mode

excuse me, I could not figure out here what are the

data1, genes and data2

may you please tell me based on my example above?

thank you

ADD REPLY • link updated 5.2 years ago by Ram 44k • written 9.1 years ago by zizigolu ★ 4.3k

2

Entering edit mode

data1 is the first file (mycounts),, data2 is the second file (mycounts1), and by ="gene" should be replaced by the column's name. IN this case both columns have to have the same name

ADD REPLY • link 9.1 years ago by Antonio R. Franco ★ 5.2k

1

Entering edit mode

even simple venn diagram and advance filtering should work

ADD REPLY • link 9.1 years ago by kanwarjag ★ 1.2k

Ram · Accepted Answer · 2015-12-31

Something is weird here, since you have the names files joined with its length and separated by a comma. Am I right?. I believe that since you have only 1 column as I can see from the dim(). It is likely that you need to do the read.csv in a different way to be able to separate both values, like using a different sep value. I need to know the format of the original file to suggest you

In that case none of these suggestions will work because the genes name are common, but then the length value has to be the same to do a merge