Hi,
So I am trying to add the gene ID to my RNA-Seq data. So what I did was use Salmon to quantify my reads but instead of doing some DE analysis, I want to try and us an alternative program to try and do a little machine learning. So I really just need add the gene symbol next to the transcript ID so I can easily identify the gene instead of having to look up the transcript ID in ensembl
I know there are some packages out there that can help with this like org.Cf.eg.db
. I can't seem to figure out how to make the package work with transcript IDs though. I am inexperienced with the package so that is most likely my issue. Here is an example of what my data looks like after it was quantified.
> head(test)
Name Length EffectiveLength TPM NumReads
1 ENSCAFT00000034820.1 957 736.829 1309.272 43423.000
2 ENSCAFT00000034824.1 1044 823.630 1001.516 37129.000
3 ENSCAFT00000034830.1 1545 1324.630 3796.436 226357.000
4 ENSCAFT00000034833.1 684 464.046 8086.686 168910.000
5 ENSCAFT00000034835.1 204 50.476 4033.303 9163.596
6 ENSCAFT00000034836.1 681 461.059 9748.035 202300.391
I really just want to add the gene symbol so the data looks something like this
> head(test)
Name Length EffectiveLength TPM NumReads Symbol
1 ENSCAFT00000034820.1 957 736.829 1309.272 43423.000 ABC
2 ENSCAFT00000034824.1 1044 823.630 1001.516 37129.000 ABD
3 ENSCAFT00000034830.1 1545 1324.630 3796.436 226357.000 ABE
4 ENSCAFT00000034833.1 684 464.046 8086.686 168910.000 ABF
5 ENSCAFT00000034835.1 204 50.476 4033.303 9163.596 ABG
6 ENSCAFT00000034836.1 681 461.059 9748.035 202300.391 ABH