Hello everyone,
I'm currently analyzing my paired-end RNA-seq data using the sorted-position.bam files and a reference.genome.gtf file.
I used the default settings of the featureCounts tool to generate a count.matrix file, which generated count.matrix contains only gene-IDs without any corresponding actual gene names or feature.
While this isn't necessarily a problem, at this stage, since the gene-IDs are unique, I would like to extract the actual names from the gtf file and merge them into the featureCounts.CSV file before proceeding with edgeR/deseq2 analysis.
When using edgeR and Deseq2 to generate a CSV file for differential gene expression analysis, only the gene-IDs are included in the output. As a result, when visualizing the results using ggplot2, the gene-IDs are displayed instead of the actual gene names.
If anyone has any suggestions or solutions, I would greatly appreciate your help. Thank you.
What is the problem, where do you get stuck?
Thank you ATpoint, I modified the main post for more clarification;
You can use BioMart to convert your "gene-IDs" to gene symbols, but you need to specify from which database your gene identifiers come from
Thank you Basti.
I annotated my genome using NCBI annotation tool, so I believe I should use the NCBI database.
However, the gene_IDS look like; JYU28_08305 JYU28_21540 JYU28_08310
And I couldn't find these IDES in any database