Pathway analysis for DGEList
1
0
Entering edit mode
4 months ago
bioinfo1994 ▴ 20

Hello, I have RNA-seq data of one sample, which has no reference in the literature. I did gene expression, but because I have nothing to compare it to, I simply created a DGElist with featureCounts and edgeR, normalized according to TPM and ranked the genes from high to low. Now I want to understand from which pathway each gene comes, and after that make a list of genes that come from a certain pathway (e.g the cell cycle). I wanted to use GO or KEGG but they don't accept a DGElist object (why?). What tool should use? It would be nice to get an example of the code.

My code right now is something like this:

fc_result <- featureCounts(files = bam_files, annot.ext = annotation_file)
count_matrix <- as.data.frame(fc_result$counts)
y <- DGEList(counts = count_matrix, group = group)
y = calcNormFactors(y) 
rpkm <- rpkm(y, gene.length = fc_result$annotation$Length
# converted to tpm

Thank you!!!!

KEGG Pathway_analysis GO gene_expression DGEList • 528 views
ADD COMMENT
0
Entering edit mode

What is contained in your annotation_file? What sort of gene identifier does it use?

GO and KEGG annotation depend on (1) a gene identifier system and (2) a database linking gene IDs to annotation. DGELists are not tied to any particular gene ID system but rather allow you to use your own.

ADD REPLY
0
Entering edit mode

Thank you for your answer. My annotation file is a GTF file that I downloaded from NCBI and used for STAR alignment, which uses gene SYMBOLs. Which functions do you recommend for retrieving annotations and filtering by pathway type? From my understanding, I don't need functions like enrichKEGG, which calculates whether certain GO terms are overrepresented (enriched) in my list of genes compared to a background set.

ADD REPLY
0
Entering edit mode
4 months ago
BioinfGuru ★ 2.1k

It sounds like what you want to do is just annotate your genes.

If your gene names are in the form of ensembl IDs, I suggest using BiomaRt. Make sure to select the same ensembl version as the reference genome used for mapping.

ADD COMMENT
0
Entering edit mode

Thanks for your response. My genome is taken from NCBI and I think I can't really use BiomaRt because its an ensembl tool. I have gene_ontology.gaf.gz on the NCBI genome page under the FTP link. it helps? can i use this? I thought to connect GTF file to gene_ontology.gaf.gz. But they don't seem to have a common gene name column (the GTF is with the gene symbols, and the gene_ontology.gaf is using different gene identifier system). Thank you.

ADD REPLY
0
Entering edit mode

Please, could you post here the first 5 lines of the counts file?

As long as the genome data and ontology data are from the same organism/build/source then they should match. What organism is it? AnnotationDbi is a great R package for annotating and supports many organisms and NCBI genomes.

I suggest learning what each column of the genome file contains. The first column is not the only one containing the gene identifier, you'll find other types of gene identifier in the other columns.

ADD REPLY

Login before adding your answer.

Traffic: 2473 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6