Question

Fethcing HGNC symbols using R package biomaRt

0

Entering edit mode

5.0 years ago

omicsnstuff • 0

I am trying to collect the HGNC symbols for genes after some high throughput RNAseq, but the syntax isn't functioning, can anyone pick out the error or tell me how to do this please?

Running biomaRt, on R/4.0.0

my syntax

  dds_covid_df<- sapply( strsplit( rownames(dds_covid), split="\\+" ), "[", 1 )
  ensembl = useMart("ENSEMBL_MART_ENSEMBL",dataset="hsapiens_gene_ensembl", )
  genemap <- getBM( attributes = c("ensembl_gene_id_version", "hgnc_symbol"), filters = "ensembl_gene_id_version", values = dds_covid_df, mart = ensembl)
                    filters = "ensembl_gene_id",
                    values = m,
                    mart = ensembl )
  idx <- match( dds_covid_df, genemap$ensembl_gene_id )
dds_covid$hgnc_symbol <- genemap$hgnc_symbol[ idx ]

I am trying to collect the HGNC symbols for genes after some high throughput RNAseq, but the syntax isn't functioning, can anyone pick out the error or tell me how to do this please?

dds_covid is my dataframe

Running biomaRt, on R/4.0.0

my syntax

dds_covid_df<- sapply( strsplit( rownames(dds_covid), split="\\+" ), "[", 1 )
  ensembl = useMart("ENSEMBL_MART_ENSEMBL",dataset="hsapiens_gene_ensembl", )
  genemap <- getBM( attributes = c("ensembl_gene_id_version", "hgnc_symbol"), filters = "ensembl_gene_id_version", values = dds_covid_df, mart = ensembl)
                    filters = "ensembl_gene_id",
                    values = m,
                    mart = ensembl )
  idx <- match( dds_covid_df, genemap$ensembl_gene_id )
dds_covid$hgnc_symbol <- genemap$hgnc_symbol[ idx ]

my results

 Gene_ID hgnc_symbol
1    ENSG00000242268.3          NA
2    ENSG00000270112.4          NA
3    ENSG00000280143.1          NA
4   ENSG00000146083.12          NA
5    ENSG00000263642.1          NA
6    ENSG00000225275.4          NA
7   ENSG00000158486.13          NA
8    ENSG00000283967.1          NA
9    ENSG00000273639.6          NA

R biomaRt RNA-Seq • 1.2k views

ADD COMMENT • link updated 5.0 years ago by Kevin Blighe 90k • written 5.0 years ago by omicsnstuff • 0

score 1 · Answer 1 · 2020-11-26

1

Entering edit mode

5.0 years ago

Kevin Blighe 90k

Hi,

You just need to 'knock off' (remove) that number at the end of each Ensembl gene ID, which relates to the ID version (I think). Something like:

sub('\\.[0-9]*$', '', m)

Kevin

ADD COMMENT • link 5.0 years ago by Kevin Blighe 90k