How To Intersect A Range With Single Positions
3
4
Entering edit mode
10.7 years ago
roll ▴ 350

Hi,

Is there any tool that will find whether a position fall into a certain region other than awk or bedtools intersect? That is, lets's assume i have a range with position

chr1:10000-20000

and i have a list of positions

chr1 100 1 2 3
chr1 12000 1 2 3
chr1 15000 1 2 3
chr1 25000 1 2 3

So that it will catch chr1 12000 1 2 3 and chr1 15000 1 2 3

I am trying this with awk but it is too slow considering i have lots of positions and ranges. I thought about bedtools intersect but then again, i have some extra information that i dont know how to keep that extra information if i convert the single positions into bed format.

Is there any tools that does this faster than awk?

intersect bed • 7.3k views
ADD COMMENT
0
Entering edit mode

If you want to keep that information in a BED file, you can just separate the last 3 columns by commas and make them the 4th column. You'll have to reparse the output, of course.

ADD REPLY
0
Entering edit mode

OR write a short script that will merge the new intersect file with the original file with the extra information.

ADD REPLY
4
Entering edit mode
10.7 years ago

Convert your positions to bed as you said and use intersectBed to give all the columns in your a and b files with options -wa and -wb (or I miss something?):

cat region.bed 
chr1    10000    20000

cat pos.txt 
chr1    100    1    2    3
chr1    12000    1    2    3
chr1    15000    1    2    3
chr1    25000    1    2    3

 awk 'BEGIN{OFS="\t"} {print $1, $2-1, $2, $3, $4, $5}' pos.txt \
    | intersectBed -a - -b region.bed -wa -wb

chr1    11999    12000    1    2    3    chr1    10000    20000
chr1    14999    15000    1    2    3    chr1    10000    20000

Hope this helps!
Dario

ADD COMMENT
0
Entering edit mode

Great. Thanks for the info. It has been a while since I have updated myself :-)

ADD REPLY
0
Entering edit mode

I've been trying to use this example for my situation and it prints to the screen well but when I try to output it to a file, it doesn't work. This is what I've tried:

awk 'BEGIN{OFS="\t"} \
    {print $1, $2-1,$2, $3, $4, $5, $6, $7, $8, $9, $10}' Nonsensetranslated.txt \
    | intersectBed -a - -b final_starts_stops.bed -wa -wb > done.txt

My error informs me:

awk: can't open file
 input record number 15577, file
 source line number 1
ADD REPLY
2
Entering edit mode
10.7 years ago
Neilfws 49k

Another solution: the findOverlaps() method from the R package IRanges. It's very fast, once you get everything into the right format and figure out the R syntax.

Some example code, using your example data:

library(IRanges)
# first, create data frames
df1 <- data.frame(chr = "chr1", start = 10000, end = 20000)
df2 <- data.frame(chr = rep("chr1", 4), posn = c(100, 12000, 15000, 250000), x = rep(1, 4), y = rep(2, 4), z = rep(3, 4))
# next, create IRanges objects
df1.ir <- IRanges(start = df1$start, end = df1$end, names = df1$chr)
df2.ir <- IRanges(start = df2$posn, end = df2$posn, names = df2$chr)
# convert the "subject" to an interval tree (much faster)
df1.tree <- IntervalTreedf1.ir)
# now find the overlaps
o <- findOverlapsdf2.ir, df1.tree, type = "within")
df2[o@queryHits,]
#       chr  posn x y z
#   2  chr1 12000 1 2 3
#   3  chr1 15000 1 2 3
ADD COMMENT
0
Entering edit mode

Just adding on to this answer. Please see here. Basically its better to use GRanges when chromosomal context is required.

ADD REPLY
1
Entering edit mode
10.7 years ago

Using BEDOPS bedmap might be helpful, to instead get an answer in a more concise range-to-mapped-positions format:

$ sort-bed ranges.unsorted.bed > ranges.bed
$ awk '{print $1"\t"$2"\t"$2+1"\t"$3"\t"$4"\t"$5}' positions.txt \
    | sort-bed - \ 
    | bedmap --echo --echo-map ranges.bed - \
    > answer.bed

Results will show each range, along with any mapped positions that overlap that range:

chr1   10000    20000 | chr1   12000 12001  1  2  3;  chr1   15000 15001  1  2  3

You can strip the extra column out with downstream tools (another awk statement, for instance).

ADD COMMENT

Login before adding your answer.

Traffic: 1779 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6