Question

How to get ENTREZ ID and SYMBOL to my gene notation in Saccharomyces cerevisiae

0

Entering edit mode

6.2 years ago

cesarihv7 • 0

i have this genes from saccharomyces cerevisiae

genes <- c("YAL002W", "YAL003W","YAL004W", "YAL005C","YAL007C","YAL008W","YAL009W","YAL010C", "ETS1-1", "ETS1-2","ETS2-1","ETS2-2", "HRA1", "ICR1", "IRT1", "ITS1-1")

and I have tried this

my.simbols <- genes

sc <- org.Sc.sgd.db select(sc, keys = my.simbols, columns = c("ENTREZID", "SYMBOL", "GENEID"), keytype = "SYMBOL")

and this is the ouput error

Error in testForValidKeytype(x, keytype) : Invalid keytype: SYMBOL. Please use the keytypes method to see a listing of valid arguments.

RNA-Seq • 3.7k views

ADD COMMENT • link 6.2 years ago by cesarihv7 • 0

0

Entering edit mode

lot of thanks SMK and ricket.woo,I think that I have to read the org.Sc.sgd.db manual becaues by can see is a basic problem, lot of thanks. However, in the column(org.Sc.sgd.db) I can't see the transcript lenght option. Can you help me to get the transcript length?

ADD REPLY • link 6.2 years ago by cesarihv7 • 0

score 3 · Answer 1 · 2019-06-18

3

Entering edit mode

6.2 years ago

AK ★ 2.2k

Hi cesarihv7,

I think you can use:

select(
  sc,
  keys = my.simbols,
  columns = c("ENTREZID", "GENENAME", "SGD")
)

Which returns:

> select(
+   sc,
+   keys = my.simbols,
+   columns = c("ENTREZID", "GENENAME", "SGD")
+ )
'select()' returned 1:1 mapping between keys and columns
       ORF ENTREZID GENENAME        SGD
1  YAL002W   851261     VPS8 S000000002
2  YAL003W   851260     EFB1 S000000003
3  YAL004W     <NA>     <NA> S000002136
4  YAL005C   851259     SSA1 S000000004
5  YAL007C   851226     ERP2 S000000005
6  YAL008W   851225    FUN14 S000000006
7  YAL009W   851224     SPO7 S000000007
8  YAL010C   851223    MDM10 S000000008
9   ETS1-1  9164941   ETS1-1 S000029717
10  ETS1-2  9164933   ETS1-2 S000029707
11  ETS2-1  9164936   ETS2-1 S000029718
12  ETS2-2  9164942   ETS2-2 S000029713
13    HRA1  9164866     HRA1 S000119380
14    ICR1  9164906     ICR1 S000132612
15    IRT1 23547381     IRT1 S000178119
16  ITS1-1  9164938   ITS1-1 S000029715

And you can use columns(org.Sc.sgd.db) to check what fields are available.

ADD COMMENT • link 6.2 years ago by AK ★ 2.2k

0

Entering edit mode

lot of thanks SMK ,I think that I have to read the org.Sc.sgd.db manual becaues by can see is a basic problem, lot of thanks. However, in the column(org.Sc.sgd.db) I can't see the transcript lenght option. Can you help me to get the transcript length?

ADD REPLY • link 6.2 years ago by cesarihv7 • 0

1

Entering edit mode

You can try biomaRt:

> library("biomaRt")
> ensembl <- useMart("ensembl", dataset = "scerevisiae_gene_ensembl")
> getBM(
+   attributes = c("ensembl_gene_id", "transcript_length", "external_gene_name", "entrezgene"),
+   filters = "ensembl_gene_id",
+   values = genes,
+   mart = ensembl
+ )
   ensembl_gene_id transcript_length external_gene_name entrezgene
1           ETS1-1               700                            NA
2           ETS1-2               700                            NA
3           ETS2-1               211                            NA
4           ETS2-2               211                            NA
5             HRA1               564                            NA
6             ICR1              3199                            NA
7             IRT1              1489                            NA
8           ITS1-1               361                            NA
9          YAL002W              3825               VPS8     851261
10         YAL003W               621               EFB1     851260
11         YAL004W               648                            NA
12         YAL005C              1929               SSA1     851259
13         YAL007C               648                        851226
14         YAL008W               597                        851225
15         YAL009W               780                        851224
16         YAL010C              1482              MDM10     851223

Available attributes can be shown using listAttributes(ensembl).

ADD REPLY • link 6.2 years ago by AK ★ 2.2k

0

Entering edit mode

hi again, when I closed session in R I tried the command again

select(sc, keys = my.simbols, columns = c("ENTREZID", "GENENAME", "GO", "PATH"))

and got the following result: 'select()' returned 1:many mapping between keys and columns

and instead of having 7000 genes with this result, I have more than 100,000 and that is incorrect

help please

ADD REPLY • link 6.2 years ago by cesarihv7 • 0

score 0 · Answer 2 · 2019-06-18

0

Entering edit mode

6.2 years ago

ricket.woo • 0

You can type: keytypes(sc) and columns(sc) to know what kind of information you can get from this AnnotationDb object.

ADD COMMENT • link 6.2 years ago by ricket.woo • 0