getting the common genes between two lists of genes
3
1
Entering edit mode
8.9 years ago
zizigolu ★ 4.3k

Hi,

I have two lists of genes

mycounts <- read.csv("geneID and length.csv", header = T, sep = "\t", stringsAsFactors = FALSE)
> colnames(mycounts)
[1] "genesID.geneslength"

> head(mycounts[1:4,])
[1] "R0010W,1272" "R0020C,1122" "R0030W,546"  "R0040C,891"

> dim(mycounts)
[1] 7130    1
mycounts1 <- read.table("read.txt", header = T, sep = "\t", stringsAsFactors = FALSE)
> dim(mycounts1)
[1] 5961    1
> colnames(mycounts1)
[1] "Freq"

How I can have only genes in my read file in my genes file? I mean genes file has 7130 that I only need 5961 of them

May you help me please? Thank you

gene R • 3.3k views
ADD COMMENT
2
Entering edit mode

Antonio is right, you are using read.csv on a comma separated file, but then set sep = "\t".

That's just wrong.

ADD REPLY
3
Entering edit mode
8.9 years ago
zx8754 12k

See %in% operator, e.g.:

data1_subset <- data1[ data1$genes %in% data2$genes, ]

Or we can use merge, e.g.:

data1_merge <- merge(data1, data2, by = "gene")
ADD COMMENT
0
Entering edit mode

excuse me, I could not figure out here what are the

data1, genes and data2

may you please tell me based on my example above?

thank you

ADD REPLY
2
Entering edit mode

data1 is the first file (mycounts),, data2 is the second file (mycounts1), and by ="gene" should be replaced by the column's name. IN this case both columns have to have the same name

ADD REPLY
1
Entering edit mode

even simple venn diagram and advance filtering should work

ADD REPLY
3
Entering edit mode
8.9 years ago

Something is weird here, since you have the names files joined with its length and separated by a comma. Am I right?. I believe that since you have only 1 column as I can see from the dim(). It is likely that you need to do the read.csv in a different way to be able to separate both values, like using a different sep value. I need to know the format of the original file to suggest you

In that case none of these suggestions will work because the genes name are common, but then the length value has to be the same to do a merge

ADD COMMENT
2
Entering edit mode
8.9 years ago
EagleEye 7.6k

If columns names (for merging) are different in two files

Common <- merge(mycounts, mycounts1, by.x=c("colNameFrom-mycounts"), by.y=c("colNameFrom-mycounts1"))

If columns names (for merging) are same in two files

Common <- merge(mycounts, mycounts1, by="colName")
ADD COMMENT

Login before adding your answer.

Traffic: 2107 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6