Hello,
I would greatly appreciate some help with this issue I'm having. Thank you in advance!
Summarizing the question in a post title was tricky, so I apologize if the title is unclear. I'll explain in detail below. To start, I have two bed files, one with non-overlapping regions of varying sizes, and one with SNP positions:
$ cat A.bed
chr1 65564 70008
chr1 944693 959240
chr1 4245331 4266330
chr1 17146107 17167757
$ cat B.bed
chr1 15715
chr1 1412732
chr1 1612000
chr1 16524041
The first bed file is ~2000 rows long, while the SNP file is almost 300,000 rows long. What I need to obtain are all of the pairwise comparisons/combinations between each row of the first bed file and each row of the second bed file, and only print these combinations to a new bed file if the SNP is +/- 250,000 bp from the start of the region in A.bed. So, the third bed file, C.bed, would contain all of these SNP-region pairs, as long as the SNP is within the specified distance of that region (250,000 bp upstream of the start of the region and 250,000 bp downstream of the start). If the SNP isn't close to any region, it is skipped.
I will be carrying this out separately for each chromosome, so the contents of the first column don't matter for the comparison. The output, C.bed, should have 4 columns.
It would look like this:
$ cat C.bed
chr1 15715 65564 70008
chr1 1412732 944693 959240
chr1 1612000 944693 959240
chr1 16524041 17146107 17167757
I tried finding the intersection of these bed files using bedtools intersect, but I couldn't figure out the right combination of parameters to use! Would this be easier in R? Thank you.
-J
Show us what you tried
Thank you for replying Pierre.
I first added a third column to the SNP bed file, so that I would get a range of 1 for each SNP
Next, I tried using bedtools window to find overlapping features between A.bed and B2.bed, but couldn't figure out how to restrict this to +/- from the start of A.bed regions only, and not both start and end (which is the default behaviour).
but in your original post you said
write the full command lines, unfortunately we don't read in minds.
The command lines I wrote out for you are the ones I actually used. My original reference to bedtools intersect was because I assumed the bedtools window command was a component of bedtools intersect. It was simply a mistake; I'm new to this. I don't expect anyone to read my mind, hence my effort to make my question as detailed as possible. Not sure why I'm being met with such a snarky remark...