Matching genomic intervals in two bed files
0
0
Entering edit mode
4.9 years ago
arsala521 ▴ 50

Hello everyone,

I am new to shell scripting and need to write a code for a task. I looked many relevant posts but couldn't figure out the solution.

I have 2 files with three columns. First column is chromosome number and 2nd and 3rd column give region on that chromosome for example Chr2 2 10 means 2 to 10 on Chr2. So file1 with three columns

Chr1    4    10
Chr1    3     8
Chr2    15   30
Chr5    1   20

and another file2 with three columns

Chr1    2    30
Chr2    5    40
Chr3    2    10

I want line in file2 to be printed the number of times file1 regions are within its region. In this case for example, line 1 of file2 will be printed twice as two of the regions of file 1 (1st 2 lines) are in this region; line 2 of file 2 will be printed once as only one region of file 1(line3) is within its region, and line 3 of file 2 will not be printed as no region of file 1 is within this region.

The output will be

Chr1    2    30
Chr1    2    30
Chr2    5    40

Please let me know if the question is unclear. Any help is highly appreciated.

Thank you

shell scripting matching column between two files • 783 views
ADD COMMENT
1
Entering edit mode

Take a look at bedtools (https://bedtools.readthedocs.io/en/latest/ ) and bedops (https://bedops.readthedocs.io/en/latest/ ) for these kind of operations. No additional code should be needed on your part other than using correct commands from these packages.

ADD REPLY

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6