Splitting genomic intervals based on chromosome arm
1
0
Entering edit mode
5.0 years ago
jtull • 0

I have a list of genomic intervals of interest and I am interested in calculating fraction of chromosome arm they make up.

For example,

chr     start      stop
1         50,000,000     100,000,000
1         120,000,000     150,000,000

And using chromosome arm size information (e.g. chr1p spans region 0-123,400,000, chr1q spans between 123,400,000 and 248,956,422), For this, first I need to split the intervals using chromosome arm boundaries, like:

chr  start   stop arm
1     50,000,000     100,000,000   p
1     120,000,000     123,400,000    p
1      123,400,000      150,000,000   q

Then I will merge the ones on the same chromosome and arm and calculate the fraction of the chromosome arm they make up. Do you have any suggestions on how to split the intervals? Is there an easy function/way or I need to write a script? Thanks.

R • 1.1k views
ADD COMMENT
2
Entering edit mode
5.0 years ago
Jianyu ▴ 580

I think you can use bedtools intersect to compare your intervals with each chromosome arm, then annotate each output with the arm used.

A simple bash example:

old_IFS=$IFS
IFS=$'\n'
for i in `cat arm.bed`; do
        name=`echo $i | cut -f 1`
        echo $i | sed 's/\([0-9]*\)[a-z]/\1/g' | bedtools intersect -a interval.bed -b - > interval.$name
        sed -i 's/$/\t'$name'/g' interval.$name
        cat interval.$name >> output.interval
done
IFS=${old_IFS}

interval.bed:

1   50000000    100000000
1   120000000   150000000

arm.bed:

1p  0   123400000
1q  123400000   248956422

output.interval:

1   50000000    100000000   1p
1   120000000   123400000   1p
1   123400000   150000000   1q

Note: I assume the chromosome arm name = chromosome name (which consists of number) + a character

ADD COMMENT
0
Entering edit mode

Thank you, it works just as I wanted! Just a note, maybe add rm interval.*[pq] in the end of the bash script to clean up the temporary interval files.

ADD REPLY

Login before adding your answer.

Traffic: 1716 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6