Hello everyone,
I am new to shell scripting and need to write a code for a task. I looked many relevant posts but couldn't figure out the solution.
I have 2 files with three columns. First column is chromosome number and 2nd and 3rd column give region on that chromosome for example Chr2 2 10 means 2 to 10 on Chr2. So file1 with three columns
Chr1 4 10
Chr1 3 8
Chr2 15 30
Chr5 1 20
and another file2 with three columns
Chr1 2 30
Chr2 5 40
Chr3 2 10
I want line in file2 to be printed the number of times file1 regions are within its region. In this case for example, line 1 of file2 will be printed twice as two of the regions of file 1 (1st 2 lines) are in this region; line 2 of file 2 will be printed once as only one region of file 1(line3) is within its region, and line 3 of file 2 will not be printed as no region of file 1 is within this region.
The output will be
Chr1 2 30
Chr1 2 30
Chr2 5 40
Please let me know if the question is unclear. Any help is highly appreciated.
Thank you
Take a look at
bedtools
(https://bedtools.readthedocs.io/en/latest/ ) andbedops
(https://bedops.readthedocs.io/en/latest/ ) for these kind of operations. No additional code should be needed on your part other than using correct commands from these packages.