Question

How to filter chip-seq results?

0

Entering edit mode

10.4 years ago

jfrigola1 • 0

Hello all,

My goal is to identify the target genes of several transcription factors.

I'd like to work with a dataset derived from ENCODE chip-seq analysis (it can be found here: http://ilab.jhsph.edu/database/dataset/HumanRank.tar.gz), where the peaks have already been mapped, etc.

For each transcription factor, there's a file with all the targets reported in the chip-seq experiment. All these targets have been ranked according to some kind of score (ChIPXpressScore). This is the head of one of these files (targets of EP300):

Rank GeneNames EntrezID ChIPXpressScore

1 FBXO33 254170 10.9

2 TCP11L2 255394 34.8

3 UPF2 26019 38.7

...

The problem is that they identify between 1000 and 10000 targets for each transcription factor (1% IDR). I've heard that this is normal in Chip experiments. What does the people do then? Pick only the top genes on the list? How many of them?

Thanks in advance!

ChIP-Seq next-gen • 3.0k views

ADD COMMENT • link updated 3.1 years ago by Ram 44k • written 10.4 years ago by jfrigola1 • 0

0

Entering edit mode

Thanks Ian, I'll try several thresholds and see if I get nice gene sets to work with.

ADD REPLY • link 10.4 years ago by jfrigola1 • 0

Ram · Answer 1 · 2014-07-02

With the minimal information you have available it might be best to identify a threshold value of ChIPXpressScore and create your lists based on that. This at least keeps one variable constant. You could then at least ask the question of what genes are bound by the different factors. I tried to look at the example file, but it no longer appears to be there.

Have you considered looking at the ENCODE data on the UCSC browser as most factors have accompanying narrowPeak files that are more descriptive.