I have two tab separated files, File_1 contains exonic (output by stringtie) information and its structure is like this:
e_id chr strand start end rcount ucount mrcount cov cov_sd mcov mcov_sd
1 1 + 3631 3913 46 46 46 10.371 10.5056 10.371 10.5056
2 1 + 3996 4276 83 83 83 22.3559 4.7919 22.3559 4.7919
3 1 + 4486 4605 47 47 47 25.2333 7.4294 25.2333 7.4294
4 1 + 4706 5095 120 120 120 23.5718 3.9786 23.5718 3.9786
File_2 contains splice junctions (outout by STAR) information and its structure is like this:
chr start end strand
1 3914 3995 1
1 4277 4485 1
1 4496 4505 1
1 4716 5075 1
* strand (0: undefined, 1: +, 2: -)
I am interested in script which first check chromosome number and then extract those lines in which start and end coordinates ($2 and $3) of file_2 lies within start and end coordinate ($4 and $5) of file_1, so the expected output will be overlapped rows from file_1 + rows from file_2. For example, start and end coordinate in third and fourth row of file_2 lies within third and fourth row of file_1 so the expected output will be:
e_id chr strand start end rcount ucount mrcount cov cov_sd mcov mcov_sd chr start end strand
3 1 + 4486 4605 47 47 47 25.2333 7.4294 25.2333 7.4294 1 4496 4505 1
4 1 + 4706 5095 120 120 120 23.5718 3.9786 23.5718 3.9786 1 4716 5075 1
Thanks in advance
Many thanks for your efforts, is it possible to place the matching rows from file_2 exactly in front of matching rows of file_1, currently its pasting all rows from file_1 and matched rows from file_2
Try swapping the files when you
paste
.