Error in conversion of ensembl to gene names
2
0
Entering edit mode
4 weeks ago
anasjamshed ▴ 140

I have mouse gene data but I am unable to do conversion through this code:

 # Load required libraries
    library(biomaRt)

    # Read the gene expression data
    expression_data <- read.delim("combined_expression.tsv", header = TRUE, sep = "\t")

    # Extract Ensembl IDs from the first column
    ensembl_ids <- expression_data[, 1]  # Assuming the first column contains Ensembl IDs

    # Use biomaRt to map Ensembl IDs to gene names
    mart <- useMart("ensembl", dataset = "mmusculus_gene_ensembl")

    # Get gene names corresponding to Ensembl IDs
    annotation <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"),
                        filters = "ensembl_gene_id",
                        values = ensembl_ids,
                        mart = mart)

    # Merge the annotation with the expression data
    # Match annotation based on Ensembl IDs
    annotated_data <- merge(annotation, expression_data, by.x = "ensembl_gene_id", by.y = colnames(expression_data)[1], all.y = TRUE)

    # Save the updated expression data as a CSV file
    write.csv(annotated_data, "updated_expression_data.csv", row.names = FALSE)

    # Print a message indicating completion
    print("Gene names successfully mapped and saved to updated_expression_data.csv.")

But its showing NA in external gene id column in updated csv file

I again tried with 10 enseml ids like:

test_ids <- head(ensembl_ids, 10)
test_annotation <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"),
                         filters = "ensembl_gene_id",
                         values = test_ids,
                         mart = mart)
print(test_annotation)

But its showing:

ensembl_gene_id    external_gene_name
<0 rows> (or 0-length row.names)
Ensembl BioMart • 710 views
ADD COMMENT
1
Entering edit mode
4 weeks ago

sounds like something is not matching, but it is not so easy to debug

instead, what you should do is fetch all annotations first, save these into a file (so that you don't have to always rerun them).

basically separate the getting annotations, from using them, that way you'll better understand what you have

library(biomaRt)

# Use biomaRt to map Ensembl IDs to gene names
mart <- useMart("ensembl", dataset = "mmusculus_gene_ensembl")

# Get gene names corresponding to Ensembl IDs
annotation <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"),
                    mart = mart)

write.csv(annotation, file = "annotation.csv", row.names = FALSE)

now in a different part of the code, load up the annotations and figure out why your ids do not match.

ADD COMMENT
0
Entering edit mode

"ENSMUST00000000001.5" "ENSMUST00000000003.14" "ENSMUST00000000010.9" "ENSMUST00000000028.14" "ENSMUST00000000033.12" "ENSMUST00000000049.6" [7] "ENSMUST00000000058.7" "ENSMUST00000000080.8" "ENSMUST00000000087.13" "ENSMUST00000000090.8"

These are first 10 ensembl ids. The problem is that it has decimal point and thats the main reason of not fetching names

ADD REPLY
0
Entering edit mode

Those are transcript IDs, not gene IDs. That may be at least part of the issue.

In my experience some of these methods also do not like having the version information (the "." and the numbers following it) included, so you may wish to trim that information off as well. I don't know whether or not that applies to biomaRT , but it might.

ADD REPLY
0
Entering edit mode

From Mike Smith the developer of BiomaRt: Mapping Ensembl Gene IDs with dot suffix

ADD REPLY
0
Entering edit mode

so if they are transcript ids then how can we convert it into gene names?

ADD REPLY
1
Entering edit mode

If you trim off the version information using the instructions in GenoMax's link above, and then filter on the ensembl_transcript_id, it should work.

Here is an example that worked for me, returning the Gene ID, Transcript ID and Gene Name successfully.

    myid <-"ENSMUST00000000001"

test_annotation <- getBM(attributes = c("ensembl_gene_id", "external_gene_name", "ensembl_transcript_id"),
                         filters = "ensembl_transcript_id",
                         values = myid,
                         mart = mart)
ADD REPLY
0
Entering edit mode

ENSMUSG00000064336 I feel its not mouse annotation?

ADD REPLY
1
Entering edit mode

ENSMUSG00000064336 I feel its not mouse annotation?

It is mouse: https://www.ensembl.org/Multi/Search/Results?q=ENSMUSG00000064336

ADD REPLY
0
Entering edit mode

its mouse sorrry. But I feel ensembl did not fetch it properly

ADD REPLY
0
Entering edit mode
4 weeks ago

Are you able to manually match your test_ids <- head(ensembl_ids, 10) by taking one and googling it? Worth doing a sanity check that your ensembl IDs are good

ADD COMMENT

Login before adding your answer.

Traffic: 3371 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6