I have mouse gene data but I am unable to do conversion through this code:
# Load required libraries
library(biomaRt)
# Read the gene expression data
expression_data <- read.delim("combined_expression.tsv", header = TRUE, sep = "\t")
# Extract Ensembl IDs from the first column
ensembl_ids <- expression_data[, 1] # Assuming the first column contains Ensembl IDs
# Use biomaRt to map Ensembl IDs to gene names
mart <- useMart("ensembl", dataset = "mmusculus_gene_ensembl")
# Get gene names corresponding to Ensembl IDs
annotation <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"),
filters = "ensembl_gene_id",
values = ensembl_ids,
mart = mart)
# Merge the annotation with the expression data
# Match annotation based on Ensembl IDs
annotated_data <- merge(annotation, expression_data, by.x = "ensembl_gene_id", by.y = colnames(expression_data)[1], all.y = TRUE)
# Save the updated expression data as a CSV file
write.csv(annotated_data, "updated_expression_data.csv", row.names = FALSE)
# Print a message indicating completion
print("Gene names successfully mapped and saved to updated_expression_data.csv.")
But its showing NA in external gene id column in updated csv file
I again tried with 10 enseml ids like:
test_ids <- head(ensembl_ids, 10)
test_annotation <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"),
filters = "ensembl_gene_id",
values = test_ids,
mart = mart)
print(test_annotation)
But its showing:
ensembl_gene_id external_gene_name
<0 rows> (or 0-length row.names)
"ENSMUST00000000001.5" "ENSMUST00000000003.14" "ENSMUST00000000010.9" "ENSMUST00000000028.14" "ENSMUST00000000033.12" "ENSMUST00000000049.6" [7] "ENSMUST00000000058.7" "ENSMUST00000000080.8" "ENSMUST00000000087.13" "ENSMUST00000000090.8"
These are first 10 ensembl ids. The problem is that it has decimal point and thats the main reason of not fetching names
Those are transcript IDs, not gene IDs. That may be at least part of the issue.
In my experience some of these methods also do not like having the version information (the "." and the numbers following it) included, so you may wish to trim that information off as well. I don't know whether or not that applies to biomaRT , but it might.
From Mike Smith the developer of BiomaRt: Mapping Ensembl Gene IDs with dot suffix
so if they are transcript ids then how can we convert it into gene names?
If you trim off the version information using the instructions in GenoMax's link above, and then filter on the
ensembl_transcript_id
, it should work.Here is an example that worked for me, returning the Gene ID, Transcript ID and Gene Name successfully.
ENSMUSG00000064336 I feel its not mouse annotation?
It is mouse: https://www.ensembl.org/Multi/Search/Results?q=ENSMUSG00000064336
its mouse sorrry. But I feel ensembl did not fetch it properly