Hello, I have RNA-seq data of one sample, which has no reference in the literature. I did gene expression, but because I have nothing to compare it to, I simply created a DGElist with featureCounts and edgeR, normalized according to TPM and ranked the genes from high to low. Now I want to understand from which pathway each gene comes, and after that make a list of genes that come from a certain pathway (e.g the cell cycle). I wanted to use GO or KEGG but they don't accept a DGElist object (why?). What tool should use? It would be nice to get an example of the code.
My code right now is something like this:
fc_result <- featureCounts(files = bam_files, annot.ext = annotation_file)
count_matrix <- as.data.frame(fc_result$counts)
y <- DGEList(counts = count_matrix, group = group)
y = calcNormFactors(y)
rpkm <- rpkm(y, gene.length = fc_result$annotation$Length
# converted to tpm
Thank you!!!!
What is contained in your
annotation_file
? What sort of gene identifier does it use?GO and KEGG annotation depend on (1) a gene identifier system and (2) a database linking gene IDs to annotation. DGELists are not tied to any particular gene ID system but rather allow you to use your own.
Thank you for your answer. My annotation file is a GTF file that I downloaded from NCBI and used for STAR alignment, which uses gene SYMBOLs. Which functions do you recommend for retrieving annotations and filtering by pathway type? From my understanding, I don't need functions like enrichKEGG, which calculates whether certain GO terms are overrepresented (enriched) in my list of genes compared to a background set.