multiIntersectBed How can I implement chip seq peak values with the overlaps?
1
0
Entering edit mode
9.2 years ago
morovatunc ▴ 560

Hi!,

I am trying to find the consensus peaks of a 8 samples of chip-sea peaks data. The format is Bed.I am not having any problems with finding the overlapping peak coordinates with thankfully bedtools multiinter function. However, when the functions find the overlaps across 8 samples, the peak values are excluded. I just want to do a normalisation or an averaging for a specific overlapped peak value across the samples.

I am looking for the effect of the transcriptional factors on a chromosomal rearrangements that's why I need those TF binding coordinates.

PS: I have used other overlapping programs such as Homer an bedops. None of them has that value normalisation feature. This cannot be a coincidence?

thank you very much for your help.

Best regards

Tunc

ChIP-Seq bedtools • 2.9k views
ADD COMMENT
1
Entering edit mode
9.2 years ago

Take a look at [bedmap --mean] to calculate the mean signal over mapped regions. Other numerical operators are available, depending on what statistics you need.

ADD COMMENT
0
Entering edit mode

Alex thank you for your reply, I tried bedmap --mean and similar mathematical calculations however, I couldnt figure out how to do that for multiple files? Could you help me out with that please?

ADD REPLY
0
Entering edit mode

You could use bedops --everything or bedops -u to do a multiset union of regions with signal of interest. Then pass this to bedmap --mean as the map file, e.g.:

bedops -u signal1.bed signal2.bed ... signalN.bed > signals.bed
bedmap --echo --mean regions.bed signals.bed > regions_with_mean_overlapping_signal.bed
ADD REPLY
0
Entering edit mode

Dear Alex, just to confirm the code;

lets say I have 3 samples;

sample1

ch1 100 200 23

sample2

ch1 102 220 30

sample3

ch1 99 210 10

the output of the $bedops -u * > signals.bed

signal.bed

ch1 100 200 23
ch1 102 220 30
ch1 99 210 10

and lets say regions is the overlapped regions form previous runs with no peak values obviously

ch1 101 230

and $bedmap --echo --mean regions.bed signals.bed > result.bed

result.bed

chr 101 230 21

so basically, takes the peak location from the regions.bed and takes the mean values of the overlapped regions then write it as a value?

Sorry, I asked it in a very long way but I wanted to be very sure about the output. Because it will effect my project result very deep. and I apologise for my bad england.

ADD REPLY
1
Entering edit mode

Your three sample files are not true BED, but are bedgraph, because you have signal in the fourth column and not the fifth. The BED specification calls for signal in the fifth column. You can use awk to preprocess your samples into true BED, e.g.,:

$ awk '{ print $1"\t"$2"\t"$3"\tid-"NR"\t"$4 }' signals.bed > signals.fixed.bed

Then you can use the fixed file in a bedmap statement. It will work as you expect, taking the mean of the signal of "sample" elements that overlap each region.

The BEDOPS bedmap docs explain this in more detail and include an example that demonstrates its use with generic signal data (DNaseI data, but any numeric signal data works).

ADD REPLY
0
Entering edit mode

Dear Alex, I did not specify my 4th column. Its is the unique peak names (like peak_2495). Thank you for your help very much. I owe you very big!

ADD REPLY
0
Entering edit mode

You're welcome!

ADD REPLY

Login before adding your answer.

Traffic: 2139 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6