I have a list of genomic intervals of interest and I am interested in calculating fraction of chromosome arm they make up.
For example,
chr start stop
1 50,000,000 100,000,000
1 120,000,000 150,000,000
And using chromosome arm size information (e.g. chr1p
spans region 0-123,400,000
, chr1q
spans between 123,400,000
and 248,956,422
), For this, first I need to split the intervals using chromosome arm boundaries, like:
chr start stop arm
1 50,000,000 100,000,000 p
1 120,000,000 123,400,000 p
1 123,400,000 150,000,000 q
Then I will merge the ones on the same chromosome and arm and calculate the fraction of the chromosome arm they make up. Do you have any suggestions on how to split the intervals? Is there an easy function/way or I need to write a script? Thanks.
Thank you, it works just as I wanted! Just a note, maybe add rm interval.*[pq] in the end of the bash script to clean up the temporary interval files.