Hi all,
I want to do a simple mathematics: I have to two files, a bed-format file that indicates the range of serval blocks within a chromosome, like:
chrom chromStart chromEnd
Chr1 715416 3431775
Chr1 3700258 5952874
Chr1 6081076 7205282
Chr1 7234954 8428511
Chr1 8997448 9100392
Chr1 9249530 9839305
Chr1 10677947 10733706
Chr1 10803957 1100682
Chr1 11061862 11205652
Chr1 15473705 16122685
And the other file is a Dxy a serial estimation based on sliding windows approach, for example:
CHR Start End Dxy
Chr1 705001 710000 0
Chr1 710001 715000 0
Chr1 715001 720000 22.125
Chr1 720001 725000 19.625
Chr1 725001 730000 2.625
Chr1 730001 735000 14.625
Chr1 735001 740000 10.375
Chr1 740001 745000 21.75
Chr1 745001 750000 7.75
I wish to sum up the $4 value in the second file base on the block range in the first file.
As I have already filtered out the SNPs that were not located in the blocks from the pervious step, it is OK to just sum up the windows that overlaps with each blocks.
My first guess is to use awk, but I have trouble to start, please kindly give me some suggestion.
Best,
CWL
Did you explore the bedtools option OR GenomicRanges?