Hello all,
My goal is to identify the target genes of several transcription factors.
I'd like to work with a dataset derived from ENCODE chip-seq analysis (it can be found here: http://ilab.jhsph.edu/database/dataset/HumanRank.tar.gz), where the peaks have already been mapped, etc.
For each transcription factor, there's a file with all the targets reported in the chip-seq experiment. All these targets have been ranked according to some kind of score (ChIPXpressScore). This is the head of one of these files (targets of EP300):
Rank GeneNames EntrezID ChIPXpressScore
1 FBXO33 254170 10.9
2 TCP11L2 255394 34.8
3 UPF2 26019 38.7
...
The problem is that they identify between 1000 and 10000 targets for each transcription factor (1% IDR). I've heard that this is normal in Chip experiments. What does the people do then? Pick only the top genes on the list? How many of them?
Thanks in advance!
Thanks Ian, I'll try several thresholds and see if I get nice gene sets to work with.