Dear all hi,
I would like a ignorant question about chip-seq bed files. My problem is even though I have been dealing with bed files for couple months, I have realised that they are the processed bed files and I would like to know if there is a gold standard process methods to get processed beds. Most of the chip-seq papers briefly explains such as we used macs14 to process chip-seq fastq files and nothing more. So I felt a little bit misguided.
This is one of the head of BED files that I consider as ok quality.
chr1 644044 644249 peak1 8.05838
chr1 831530 831681 peak1 3.61544
chr1 900849 901108 peak2 6.77098
chr1 931535 931798 peak3 5.76549
chr1 960454 960776 peak4 7.79623
chr1 967782 967928 peak5 3.42912
chr1 967933 968142 peak1 8.01545
chr1 1015395 1015544 peak2 6.16261
chr1 1062523 1062669 peak6 4.58519
chr1 1114526 1114795 peak3 7.18694
chr1 1133651 1133837 peak7 4.70157
chr1 1133974 1134157 peak8 6.35043
chr1 1225207 1225360 peak4 3.65434
chr1 1233033 1233258 peak9 10.23716
chr1 1240866 1241022 peak5 4.38862
chr1 1269218 1269485 peak10 8.46963
This is one of the bed files that I want to learn how to process it. I guess these are reads and we can use bed merge to count them ?!?
==> sorted_GSM1442789_mock_p300.bed <==
chr1 10175 10225 SN608VA04562315268.70401.50 255 +
chr1 10238 10288 SN608VA04562307401.904854.30 255 +
chr1 17461 17511 SN608VA04552313922.002700.10 255 +
chr1 17470 17520 SN608VA04551207701.404148.50 255 +
chr1 87067 87117 SN608VA045511021679.002497.60 255 +
chr1 100632 100682 SN608VA04551216367.905873.00 255 +
chr1 150554 150604 SN608VA04552305857.108801.20 255 +
chr1 532437 532487 SN608VA04562204347.808268.70 255 +
chr1 533139 533189 SN608VA04552209399.109424.70 255 +
chr1 533139 533189 SN608VA045512071736.301060.60 255 +
Also, I have one more bed file sample that I have used merge as it was suggested in this post. I have applied bed merge -d (some base pair) to make it "concentrated" then, eliminated all the peaks have more than 100 counts but I would be more than glad if you can point me out gold standards or statistical ways to process this data.
chr1 7325 7361 r_1 1 -
chr1 7334 7370 r_2 2 -
chr1 90496 90532 r_3 1 -
chr1 523003 523039 r_4 2 +
chr1 554319 554355 r_5 1 -
chr1 554321 554357 r_6 10 -
chr1 554322 554358 r_7 2 -
chr1 554323 554359 r_8 11 +
chr1 554323 554359 r_9 2 -
chr1 554324 554360 r_10 19 -
Please teach me how to catch a fish ! :)
Best,
Tunc.
BED is a pretty generic format. Can you describe what your actual goal is? There's no single way to process stuff like this (and I would never even create the second one you showed).
My aim is to compare the binding of a transcriptional factor across different tissues. Later on I will annotate those regions based on their functions.
1) But right now, my bed file sizes are very high (~500 MB) because of the peak widths are <50 bp and low peak height. So I think I should merge them with bedtools merge -d and filter out low height peaks. I need some publications that have studied this kind of filtration or enrichment to loose the noise. (I have to prove my PI that I am doing this based on a logic/previous publication.)
2) I did not understand why the second type of bed file existed. My best guess about that bed type is, they are just the locations of the reads so I wanted to validate my guess.
Thank you for your help,
Tunc.