Identification Of Genomic Regions Where Multiple Tf Binds.
2
0
Entering edit mode
12.9 years ago
Dataminer ★ 2.8k

Hi!

I have peak called data of 8 transcription factors (using MACS on BED files).

The format of each file is:
Chr Chr_Start Chr_Stop

Basically three columns.

I want to find the regions where atleast 4 TF bind (Any 4).

Note: I already have a union of these regions in a file and have counted tags for each TF in these region.

Thank you,

chip-seq overlap • 2.9k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
2
Entering edit mode
12.9 years ago
Ian Simpson ▴ 960

Well one of the first things you need to decide is how you define a 'region'. Fixed size, minimum TF density etc. If you can (albeit fairly arbitrarily) decide this it's simply a case of windowing across the sequences and keeping running totals for the TFs in the bins. You can then summarise the window counts across the 8 and only keep the ones where the sum is greater than 4.

If I were doing this I would hack together a quick Perl script to do the job. I wouldn't think this would take too long to do if you're familiar with scripting.

ADD COMMENT
0
Entering edit mode

@Ian: I like a good Perl hack myself - still, interval logic is best dealt with through a library/module. It's not quite as sinister as regex for XML, but I've tried it from scratch and there are a number of gotchas that make anything quick/throw-away prone to error

ADD REPLY

Login before adding your answer.

Traffic: 1973 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6