Heys,
I have a simple question but I am not managing to solve. I have a txt file with records about genome-wide heterozygosity for one individual, like this:
NC_018723.3 50001 305 39182 0.00778419
NC_018723.3 150000 644 78927 0.00815944
NC_018723.3 250000 28 83487 0.000335382
NC_018723.3 350000 43 84221 0.000510561
NC_018723.3 450000 56 73332 0.00076365
NC_018723.3 550000 52 77842 0.00066802
Where the first column is the chromosome, the second column is the genomic coordinates (I did non-overlapping sliding windows of 100Kb and in the second column I have a number which is half the sliding windows), third column is number of SNPs, fourth column number of called bases and fifth column is the division SNPs / callable.
Then, I have a second file with bed coordinates of the regions I want to first include and latter exclude for re-calculating the heterozygosity. So, what I want is: if one sliding windowns is within one of the regions I have in my bed file, make a file including all of them and a second file excluding all of them. How can I do it? It is not necessary to be done in bash!
the second file where I have the bed coordinates is like this:
NC_018723.3 203270 441160
NC_018723.3 624960 695520
NC_018723.3 756696 977820
NC_018723.3 1005429 1221086
NC_018723.3 1240095 1705853
NC_018723.3 1747839 1964846
NC_018723.3 1975644 2136144
NC_018723.3 2169657 2651377
and the expected output file would be this:
NC_018723.3 250000 28 83487 0.000335382
NC_018723.3 350000 43 84221 0.000510561
NC_018723.3 450000 56 73332 0.00076365
As these three entries are within the first column from the bed coordinates.
Thanks in advance!
It would help understanding the issue if you post expected output and example input files instead of explaining the problem.
Sorry for that, so a part of the input file I already uploaded, the second file where I have the bed coordinates is like this:
and the expected output file would be this:
As these three entries are within the first column from the bed coordinates. Does this help?
Thank you. I added this information to OP, for others to understand the post.