How to add gene symbol to RNA-Seq data using R
1
2
Entering edit mode
6.3 years ago

Hi,

So I am trying to add the gene ID to my RNA-Seq data. So what I did was use Salmon to quantify my reads but instead of doing some DE analysis, I want to try and us an alternative program to try and do a little machine learning. So I really just need add the gene symbol next to the transcript ID so I can easily identify the gene instead of having to look up the transcript ID in ensembl

I know there are some packages out there that can help with this like org.Cf.eg.db. I can't seem to figure out how to make the package work with transcript IDs though. I am inexperienced with the package so that is most likely my issue. Here is an example of what my data looks like after it was quantified.

> head(test)
                  Name Length EffectiveLength      TPM   NumReads
1 ENSCAFT00000034820.1    957         736.829 1309.272  43423.000
2 ENSCAFT00000034824.1   1044         823.630 1001.516  37129.000
3 ENSCAFT00000034830.1   1545        1324.630 3796.436 226357.000
4 ENSCAFT00000034833.1    684         464.046 8086.686 168910.000
5 ENSCAFT00000034835.1    204          50.476 4033.303   9163.596
6 ENSCAFT00000034836.1    681         461.059 9748.035 202300.391

I really just want to add the gene symbol so the data looks something like this

> head(test)
                  Name Length EffectiveLength      TPM   NumReads Symbol
1 ENSCAFT00000034820.1    957         736.829 1309.272  43423.000 ABC
2 ENSCAFT00000034824.1   1044         823.630 1001.516  37129.000 ABD
3 ENSCAFT00000034830.1   1545        1324.630 3796.436 226357.000 ABE
4 ENSCAFT00000034833.1    684         464.046 8086.686 168910.000 ABF
5 ENSCAFT00000034835.1    204          50.476 4033.303   9163.596 ABG
6 ENSCAFT00000034836.1    681         461.059 9748.035 202300.391 ABH
R rna-seq • 6.1k views
ADD COMMENT
17
Entering edit mode
6.3 years ago
Prakash ★ 2.2k

you can use R package "biomaRt" to annotate you transcript id to gene name. see if the below code works

library( "biomaRt" )
mart = useMart('ensembl')
# list all the ensembl database of organisms
listDatasets(mart)  
#choose database of your interest ; in this case its "cfamiliaris_gene_ensembl" I guess
ensembl = useMart( "ensembl", dataset = "cfamiliaris_gene_ensembl" )  
# choose attributes of your interest
listAttributes(ensembl)
gene <- getBM( attributes = c("ensembl_transcript_id","external_gene_name"),values = test$Name,mart = ensembl)  
#Macth your transcript id with ensembl_transcript_id
id <- match(test$Name , gene$ensembl_transcript_id)
#Add Gene symbol column in your data frame
test$Symbol <- gene$external_gene_name[id]
head(test)
ADD COMMENT

Login before adding your answer.

Traffic: 1757 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6