Hello everyone,
I have finished differential expression analysis of my RNA-seq data and retrieve the transcripts of transcription factors(TF) from differentially expressed unigenes and non differentially expressed unigenes. then I have categorized those TFs to different TF families. then I realized there are many TFs sequences which belong to the same TF family overly expressed than some other TF families, some parts of my data:
TF_families non_DE_TFs_family_No. DE_TFs_family_No.
AP-2 2 0
ARID 5 2
bHLH 67 8
CG-1 1 1
COE 2 0
CP2 4 0
CSD 4 0
CSL 1 0
CUT 6 1
DDT 1 0
DM 6 0
E2F 3 0
ETS 10 2
Fork_head 22 2
..
..
and I wanna find overly expressed TF families in my DE TF sets at treated condition. I saw a several papers analysed this kind of data and they used fisher's exact test for that. I searched some materials about fisher's exact test, but in those example fisher's test used for 2x2 contingency table. here I still have no idea how to use fisher's test to my data. so could anyone explain to me how could I do this?
Thanks @Asaf,
Actually I wanted to add this thanks to by adding comment, but now in china(where I am now) it is heard to even open biostars home page and it is impossible to comment others response. so I am here thank you by "add answer" and want to ask u a question.
I used the contingency table as you recommended:
(e.g. for TF family RHD)
The fisher's exact test result for this table is :
from the result we can see that Right P-value and 2-Tail p-value is smaller than 0.05, does that mean in treated group TFs transcripts in the transcriptome data, RHD TF family overly expressed?
Which p-value should I use?
Could you explain me some thing about this?
Thanks
Yes, you can say that there are more DE TFs in the RHD family than expected at random. Read a bit about the test, you'll understand what are the right and left p-values (briefly: top left cell is more than expected at random or less, respectively)