If your (tab-delimited) text file does not have a header row:
$ awk '($1 < 21) && ($2 > 21)' data.txt > answer.txt
If your text file has a header row ("start" and "end"):
$ tail -n +2 data.txt | awk '($1 < 21) && ($2 > 21)' > answer.txt
Since you tagged your question with the BED tag, if you're working with a BED file, there is a faster way to do this.
The BEDOPS bedextract
tool can do a binary or O(log n) search over a sorted BED file, for instance, whereas a simplistic use of awk
(such as the ones I wrote above) will read through the entire file, which is a linear or O(n) search. For large BED files, if sorted, a linear scan is a waste of time.
For example, a search for position 21
along a hypothetical chromosome chrN
is much faster this way:
$ echo -e "chrN\t21\t22" | bedextract query.bed - > answer.bed
The file answer.bed
contains elements from query.bed
that overlap ("contain") position 21
— the half-open genomic region [21, 22)
— along chromosome chrN
.
What do you mean by "21 in between"? Since you list the interval "6 15" in your desired output I don't understand it.
Changed the formatting, guess op wants to return the intervals that contain a given position. however, [6,15] doesn't contain 21 and therefore the example is wrong. Otherwise this looks like simple case of interval arithmetics. I would like to ask for the biological application of the case, it determines which method is best.
In case you need to search for different locations more than once in a large set of intervals: What Is The Quickest Algorithm For Range Overlap?