Question

Error in conversion of ensembl to gene names

0

Entering edit mode

1 day ago

anasjamshed ▴ 140

I have mouse gene data but I am unable to do conversion through this code:

 # Load required libraries
    library(biomaRt)

    # Read the gene expression data
    expression_data <- read.delim("combined_expression.tsv", header = TRUE, sep = "\t")

    # Extract Ensembl IDs from the first column
    ensembl_ids <- expression_data[, 1]  # Assuming the first column contains Ensembl IDs

    # Use biomaRt to map Ensembl IDs to gene names
    mart <- useMart("ensembl", dataset = "mmusculus_gene_ensembl")

    # Get gene names corresponding to Ensembl IDs
    annotation <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"),
                        filters = "ensembl_gene_id",
                        values = ensembl_ids,
                        mart = mart)

    # Merge the annotation with the expression data
    # Match annotation based on Ensembl IDs
    annotated_data <- merge(annotation, expression_data, by.x = "ensembl_gene_id", by.y = colnames(expression_data)[1], all.y = TRUE)

    # Save the updated expression data as a CSV file
    write.csv(annotated_data, "updated_expression_data.csv", row.names = FALSE)

    # Print a message indicating completion
    print("Gene names successfully mapped and saved to updated_expression_data.csv.")

But its showing NA in external gene id column in updated csv file

I again tried with 10 enseml ids like:

test_ids <- head(ensembl_ids, 10)
test_annotation <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"),
                         filters = "ensembl_gene_id",
                         values = test_ids,
                         mart = mart)
print(test_annotation)

But its showing:

ensembl_gene_id    external_gene_name
<0 rows> (or 0-length row.names)

Ensembl BioMart • 403 views

ADD COMMENT • link updated 22 hours ago by Istvan Albert 102k • written 1 day ago by anasjamshed ▴ 140

score 1 · Answer 1 · 2024-12-11

1

Entering edit mode

1 day ago

Istvan Albert 102k

sounds like something is not matching, but it is not so easy to debug

instead, what you should do is fetch all annotations first, save these into a file (so that you don't have to always rerun them).

basically separate the getting annotations, from using them, that way you'll better understand what you have

library(biomaRt)

# Use biomaRt to map Ensembl IDs to gene names
mart <- useMart("ensembl", dataset = "mmusculus_gene_ensembl")

# Get gene names corresponding to Ensembl IDs
annotation <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"),
                    mart = mart)

write.csv(annotation, file = "annotation.csv", row.names = FALSE)

now in a different part of the code, load up the annotations and figure out why your ids do not match.

ADD COMMENT • link 22 hours ago by Istvan Albert 102k

0

Entering edit mode

"ENSMUST00000000001.5" "ENSMUST00000000003.14" "ENSMUST00000000010.9" "ENSMUST00000000028.14" "ENSMUST00000000033.12" "ENSMUST00000000049.6" [7] "ENSMUST00000000058.7" "ENSMUST00000000080.8" "ENSMUST00000000087.13" "ENSMUST00000000090.8"

These are first 10 ensembl ids. The problem is that it has decimal point and thats the main reason of not fetching names

ADD REPLY • link 1 day ago by anasjamshed ▴ 140

0

Entering edit mode

Those are transcript IDs, not gene IDs. That may be at least part of the issue.

In my experience some of these methods also do not like having the version information (the "." and the numbers following it) included, so you may wish to trim that information off as well. I don't know whether or not that applies to biomaRT , but it might.

ADD REPLY • link 1 day ago by Dave Carlson ★ 2.1k

0

Entering edit mode

From Mike Smith the developer of BiomaRt: Mapping Ensembl Gene IDs with dot suffix

ADD REPLY • link 1 day ago by GenoMax 148k

0

Entering edit mode

so if they are transcript ids then how can we convert it into gene names?

ADD REPLY • link 1 day ago by anasjamshed ▴ 140

0

Entering edit mode

If you trim off the version information using the instructions in GenoMax's link above, and then filter on the ensembl_transcript_id, it should work.

Here is an example that worked for me, returning the Gene ID, Transcript ID and Gene Name successfully.

    myid <-"ENSMUST00000000001"

test_annotation <- getBM(attributes = c("ensembl_gene_id", "external_gene_name", "ensembl_transcript_id"),
                         filters = "ensembl_transcript_id",
                         values = myid,
                         mart = mart)

ADD REPLY • link 1 day ago by Dave Carlson ★ 2.1k

0

Entering edit mode

ENSMUSG00000064336 I feel its not mouse annotation?

ADD REPLY • link 1 day ago by anasjamshed ▴ 140

1

Entering edit mode

ENSMUSG00000064336 I feel its not mouse annotation?

It is mouse: https://www.ensembl.org/Multi/Search/Results?q=ENSMUSG00000064336

ADD REPLY • link 1 day ago by GenoMax 148k

0

Entering edit mode

its mouse sorrry. But I feel ensembl did not fetch it properly

ADD REPLY • link 1 day ago by anasjamshed ▴ 140

score 0 · Answer 2 · 2024-12-11

0

Entering edit mode

1 day ago

yura.grabovska ▴ 690

Are you able to manually match your test_ids <- head(ensembl_ids, 10) by taking one and googling it? Worth doing a sanity check that your ensembl IDs are good

ADD COMMENT • link 1 day ago by yura.grabovska ▴ 690