Could try biomaRt:
require("biomaRt")
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
ens <- c("ENSG00000100601.5", "ENSG00000178826.6",
"ENSG00000243663.1", "ENSG00000138231.8")
ensLookup <- gsub("\\.[0-9]*$", "", ens)
ensLookup
[1] "ENSG00000100601" "ENSG00000178826" "ENSG00000243663" "ENSG00000138231"
annotLookup <- getBM(
mart=mart,
attributes=c("ensembl_transcript_id", "ensembl_gene_id",
"gene_biotype", "external_gene_name"),
filter="ensembl_gene_id",
values=ensLookup,
uniqueRows=TRUE)
annotLookup <- data.frame(
ens[match(annotLookup$ensembl_gene_id, ensLookup)],
annotLookup)
colnames(annotLookup) <- c(
"original_id",
c("ensembl_transcript_id", "ensembl_gene_id",
"gene_biotype", "external_gene_name"))
annotLookup
original_id ensembl_transcript_id ensembl_gene_id gene_biotype
1 ENSG00000100601.5 ENST00000216489 ENSG00000100601 protein_coding
2 ENSG00000100601.5 ENST00000557057 ENSG00000100601 protein_coding
3 ENSG00000100601.5 ENST00000555100 ENSG00000100601 protein_coding
4 ENSG00000100601.5 ENST00000554097 ENSG00000100601 protein_coding
5 ENSG00000138231.8 ENST00000260803 ENSG00000138231 protein_coding
6 ENSG00000138231.8 ENST00000460271 ENSG00000138231 protein_coding
7 ENSG00000138231.8 ENST00000477557 ENSG00000138231 protein_coding
8 ENSG00000138231.8 ENST00000463982 ENSG00000138231 protein_coding
9 ENSG00000178826.6 ENST00000409102 ENSG00000178826 protein_coding
10 ENSG00000178826.6 ENST00000487419 ENSG00000178826 protein_coding
11 ENSG00000178826.6 ENST00000359333 ENSG00000178826 protein_coding
12 ENSG00000178826.6 ENST00000480421 ENSG00000178826 protein_coding
13 ENSG00000178826.6 ENST00000409244 ENSG00000178826 protein_coding
14 ENSG00000178826.6 ENST00000409541 ENSG00000178826 protein_coding
15 ENSG00000178826.6 ENST00000410004 ENSG00000178826 protein_coding
16 ENSG00000178826.6 ENST00000482420 ENSG00000178826 protein_coding
17 ENSG00000178826.6 ENST00000471161 ENSG00000178826 protein_coding
18 ENSG00000243663.1 ENST00000493072 ENSG00000243663 processed_pseudogene
external_gene_name
1 ALKBH1
2 ALKBH1
3 ALKBH1
4 ALKBH1
5 DBR1
6 DBR1
7 DBR1
8 DBR1
9 TMEM139
10 TMEM139
11 TMEM139
12 TMEM139
13 TMEM139
14 TMEM139
15 TMEM139
16 TMEM139
17 TMEM139
18 RPS4XP14
...or without ensembl_transcript_id
:
annotLookup <- getBM(
mart=mart,
attributes=c("ensembl_gene_id", "gene_biotype", "external_gene_name"),
filter="ensembl_gene_id",
values=ensLookup,
uniqueRows=TRUE)
annotLookup <- data.frame(
ens[match(annotLookup$ensembl_gene_id, ensLookup)],
annotLookup)
colnames(annotLookup) <- c(
"original_id",
c("ensembl_gene_id", "gene_biotype", "external_gene_name"))
annotLookup
original_id ensembl_gene_id gene_biotype external_gene_name
1 ENSG00000100601.5 ENSG00000100601 protein_coding ALKBH1
2 ENSG00000138231.8 ENSG00000138231 protein_coding DBR1
3 ENSG00000178826.6 ENSG00000178826 protein_coding TMEM139
4 ENSG00000243663.1 ENSG00000243663 processed_pseudogene RPS4XP14
Otherwise, you can always remove the string after the period.
hello Sukhdeep,
I have exactly the same question as User6891 and after i try to remove the decimal i get an error.
Could you please help me with this?
Command should work, I see you have some unidentified symbol in the command you pasted.
Try to write it and see if it works!
this is my command ...and it shows a question mark in the error.
As I said, the above command should work, unless you have a copy-paste error, or the object
res
has some issue. Checkrow.names(res)
, what does it outputs!Its working thanks alot :) and thanks for your patience.
But 1 more question how do i put the edited ENSEMBL id from tmp back to my res column?
I know it is a very basic question but I am new to R.
Thanks alot Sukhdeep ...it all worked fine :)
Great, good luck then!
How did you eventually add tmp back to the res row.names? The answer is not in this thread and I can't figure it out.
Also, is it possible to edit the gene ids in-place instead of creating 'tmp'?
can you explain what does it
"\\..*","",
remove the string after the period
i.e. delete (technically substitute) everything that follows. See this.