We found DE genes and then performed GO / pathway enrichment analysis (fisher exact test is used, right?)
What we are doing now is based on these two ratio: 1: number of genes in specific GO / total number of genes 2: number of DE genes in GO / total number of DE genes
Someone suggest that we should exclude genes that not express in both groups.
1: total number of genes -> total number of genes - genes that not express in both groups 2: total number of DE genes -> total number of DE genes - genes that not express in both groups
This suggestion is also kind of make sense considering genes in DE list has to be expressed in at least one group.
Can anyone share some comments? Thanks
Look at here
http://lrpath.ncibi.org/
might be helpful
I didn't see anything answering my question in your link. Can you share some comments?
I suggested this because GO and pathway analysis could be done regarding differential expression. For instance you provide raw read counts in RNA-seq and the program gives you GO and pathways
Yes. My question is, when counting the raw read count, should I subtract the genes that were not expressed in both group?
if you mean extremely low expressed genes or genes that are all zero for all samples, I used to removing these genes beforehand. If not, sorry I don't know
Yes, that is what I am asking. I used to not exclude genes that don't express at all. Guess that was wrong
Yes, just remove them, but this is done at the raw count stage, for example, removing all transcripts (genes) whose mean raw count is <10. Genes with high numbers of NAs can also be filtered out. Filtering prior to normalisation and differential expression analysis can vary from study to study.
Then, by the time that you reach the gene enrichment stage, you can have high confidence that the genes that you have included are by default expressed in both groups.