Entering edit mode
3 months ago
min
•
0
I'm trying to convert a list of Gene IDs to the Gene Symbol. However, it seems that I cannot use library("AnnotationDbi")
. I used featureCounts to generate count_data.csv from a GTF file obtained from NCBI. Here is my count_data.csv:
Could someone help me with how to map Gene IDs to Gene Symbols using a different approach?
I would greatly appreciate any guidance.
Please do not paste screenshots of plain text content, it is counterproductive. You can copy paste the content directly here (using the code formatting option shown below), or use a GitHub Gist if the content volume exceeds allowed length here.
I will pay attention next time.
what species are you looking at? you could use biomaRt data mining tool from Ensembl - https://www.ensembl.org/biomart/martview. You can copy the file containing gene IDs and ask biomaRt to give you the corresponding gene symbols. It is difficult to help you with the AnnotationDbi package since you do not mention what your error is.
Thank you for your response. I am analyzing a strain of S. aureus, but my strain is not available on ENSEMBLbacteria.
hello,
You can obtain gene symbols/gene names from a GFF file based on the features for which you have read abundance data
Example: if you get the read abundance based on transcripts using featurecounts,you can extract the transcript-related information, which includes transcript IDs, gene IDs, gene names, and descriptions from GFF then match the gene symbols/gene names to the identified transcripts in your matrix.
Thank you for your response. I tried this method and it worked. However, many GeneIDs do not include Gene names in the GTF file. How can I resolve this issue?
Then for those genes, you can try to get gene names from Uniprot database for each gene IDs,specific to the strain.
One more,you can try out biodbnet website [https://biodbnet-abcc.ncifcrf.gov/db/db2db.php] for organism specific gene names. Input will be gene IDs and Tax ID of the organism.
cheers!
I realize that my issue is more complex, I will make a new post and describe it in detail.
Which genome are these ID's from?
Thank you for your response. These ID's from my GTF file.
If gene ID's are not available in GTF you will need to do additional work to figure our what gene it could be. This could minimally involve blasting the protein sequences and identifying the genes. You could also try tools like
LiftOffTools
(LINK) orRATT
(RATT) if annotations are available for a closely linked genome relative.