I'm a noob, so I apologize for what is probably a very basic question, but I cant quite figure out how to do what I'm trying to do correctly. I also don't think I have the vocabulary to accurately explain what it is I'm confused about, so I apologize in advance.
I have successfully replaced the ensemble IDs with gene symbols from MGI numerous times with biomart. However, I am struggling with this count file that has the ensemble ID versions
I can remove the version numbers easily using the following, and then I can use biomart to successfully convert the ensemble IDs into symbols
df <- read.csv("Tuveson_counts_LRT.csv", sep=",")
head(df)
X baseMean log2FoldChange lfcSE stat pvalue padj significant
1 ENSMUSG00000000486.7 1.3283025 -0.78624588 1.5531561 0.4214789 0.9806809 NA <NA>
2 ENSMUSG00000079557.4 31.1085926 0.08715468 0.3561105 2.7204579 0.6056395 0.9999994 <NA>
3 ENSMUSG00000026276.10 118.3799877 -0.02395615 0.1968759 0.5095415 0.9725655 0.9999994 <NA>
4 ENSMUSG00000032656.8 5.8821849 -0.15815182 0.7890379 0.2655061 0.9919307 0.9999994 <NA>
5 ENSMUSG00000022456.9 0.9019521 -1.93237167 2.0918497 1.4395258 0.8372970 NA <NA>
6 ENSMUSG00000020486.11 5.8367904 0.12988447 0.7918816 0.6535026 0.9569368 0.9999994 <NA>
genes <- df$X
genes <- gsub("\\..*","", genes)
head(genes)
[1] "ENSMUSG00000000486" "ENSMUSG00000079557" "ENSMUSG00000026276" "ENSMUSG00000032656" "ENSMUSG00000022456"
[6] "ENSMUSG00000020486"
mart <- useDataset("mmusculus_gene_ensembl", useMart("ensembl"))
G_list <- getBM(filters="ensembl_gene_id",
+ attributes= c("ensembl_gene_id", "mgi_symbol"),
+ values = genes,
+ mart = mart)
head(G_list)
ensembl_gene_id mgi_symbol
1 ENSMUSG00000000028 Cdc45
2 ENSMUSG00000000058 Cav2
3 ENSMUSG00000000088 Cox5a
4 ENSMUSG00000000127 Fer
5 ENSMUSG00000000142 Axin2
6 ENSMUSG00000000148 Brat1
Usually I would use the following to merge the output from G_list
and the original df
, but that wont work now since the "renamed" column is actually the value df$X
.
counts_symbol <- merge(df, G_list, by.x ="X", by.y="ensembl_gene_id")
head(counts_symbol)
[1] X baseMean log2FoldChange lfcSE stat pvalue padj
[8] significant mgi_symbol
<0 rows> (or 0-length row.names)
So how do I change the actual column X in df so that the version numbers are removed, and so the merge works correctly?
TIA!
yup, that'll do it! tysm
but can you explain the difference between why this didnt work
but this did?
genes <- df$X
copies the data from theX
column and assigns this copy to thegenes
variable. Since you were operating on a copy of part of the original data.frame, and not the original data.frame itself, the original data.frame remained unchanged.You could have went back and modified the original data.frame by adding this third line of code to what you have above
df$X <- genes
, which is overriding the oldX
column with the modifiedX
column data saved to thegenes
variable.