Hello Everyone, Can you please share with me your insights on how to split a bedGraph file into genomic coordinates of equal bin size? I have average log2(fold enrichment) values calculated for a chIP over input as follows: [columns: 1) chr.name; 2) start; 3)end; 4) log2 value]:
chr1 0 450 0
chr1 450 500 1.4033
chr1 500 650 1.79393
chr1 650 700 0.865939
chr1 700 950 0
chr1 950 1000 0.865939
Now, I want to expand this file in such a way that the values are reported for defined 50bp windows, instead of windows of non-uniform length. As you can see, wherever the log value is same, the windows are combined to make one big window (for example, I want to change the 0-450 into 9x(50)).
I want to do this so that, I can then use two such log2ratio files (corresponding to two chIPs) to make a correlation plot. I am new to NGS data analysis so any and all help is appreciated!
Guidance on how to do this using a python script is highly appreciated.
Thank you.
Hi Venu,
Thanks a lot for your reply. Just to clarify: in step 1) do you mean to make 50bp windows from the bedGraph file that I already have? if yes, then do I get rid of the fourth column?
Thank you a
Not from bedGraph file. Check bedtools makewindows function. It would be from chr sizes file.
Thank you very much! It worked!