Question

Over representation of TFBS

0

Entering edit mode

10.2 years ago

bionovice • 0

Hi guys,

So I am looking to check for over representation of tfbs. I know this is done using a Fischer's test but it doesn't seem to function suitably.

I have my TF hits and their frequencies in the different kinds of treatment. I also have scrambled sequences frequency.

If a certain TFBS has a frequency then my table has a yes value and if it has no frequency(0) then it has a no value.

I just cannot seem to crack it. All help will b appreciated. Thank you.

tfbs transcription sequence • 2.6k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by bionovice • 0

0

Entering edit mode

Just a note: There has been over-representation of this question on the forum already:

ADD REPLY • link 3.0 years ago by Ram 45k

0

Entering edit mode

I know there has but the reason im posting again is cos it hasnt answered my question or helped.

ADD REPLY • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by bionovice • 0

0

Entering edit mode

Hmmm, I see.

ADD REPLY • link 10.2 years ago by Ram 45k

Ram · Answer 1 · 2015-05-14

Fisher depends on

a) The enrichment
b) Amount of TFBS in the genome!
c) Amount of candidate genes for enrichment. Amount of other genes.

If "it doesn't seem to function suitably", you might have prior knowledge about a TF involved in your context.

You thus might want to do a quick check, whether TFBS enrichment for this factor is reasonable: If you know how many possible TFBS exist for it in the genome you could simulate how strong the enrichment would have to be - given your list of candidate genes for enrichment. Could the necessary enrichment be reached within your experimental setup? Is the amount of enrichment (e.g.: fold change) reasonable compared to enrichments observed in similar biological contexts (e.g.: similar stimulation, similar/same tissue/cell type...)?

Ram · Answer 2 · 2015-05-14

I don't know exactly what you're trying to do. But here's one approach, depending on what you're trying to do.

Measure counts of all TF binding sites in regions of interest (e.g., "treatment").
Measure counts of specific-TF-of-interest binding sites in regions of interest.
Measure counts of all TF binding sites across background (e.g., "whole-genome").
Measure counts of specific-TF-of-interest sites across background.

Given the frequencies of observations across the whole genome, the probability that you observe a certain number of specific-TF-of-interest sites can be calculated from these counts using a hypergeometric distribution. You might perhaps generate these probabilities for a set of TFs and treatments of interest, measuring their relative expectations.