intersection of two files
1
0
Entering edit mode
8.0 years ago
a.rex ▴ 350

Is there an efficient way of joining two files in the following way - i.e. using BedTools?

For example, FILE 1:

scaffold1          0        206        transcript_loc.00001      exon    
scaffold1         262      749       transcript_loc.00001      exon    
scaffold1         1391    1549     transcript_loc.00001      exon

FILE2:

scaffold1        517     540     Simple_repeat   
scaffold1        1063    1162    LTR/Gypsy        
scaffold1        1400     1498   LTR

Resultant file:

 scaffold1          0        206        transcript_loc.00001      exon    
 scaffold1         262      749       transcript_loc.00001      exon          517     540     Simple_repeat   
 scaffold1         1391    1549     transcript_loc.00001      exon          1400     1498   LTR

In this way, wherever the two files intersect, the intersection is appended to a new column on the first file. The non-intersect is discarded.

Many thanks.

bedtools • 1.7k views
ADD COMMENT
2
Entering edit mode

This can be done with intersectBed option from BedTools. Check wa, wb, wo options. Before using bedtools change your file formats to actual BED format.

ADD REPLY
3
Entering edit mode
8.0 years ago

Sure, you can use BEDOPS bedmap to map overlaps in one file to elements in another file:

$ bedmap --echo --echo-map --delim '\t' file1.bed file2.bed > answer.bed

So long as your files are sorted, they are BED and can be used as-is. If not:

$ sort-bed file1.unsorted.bed > file1.bed
$ sort-bed file2.unsorted.bed > file2.bed

Then use bedmap, as described.

If you want to discard rows without overlaps between the first and second inputs, add the --skip-unmapped option:

$ bedmap --echo --echo-map --delim '\t' --skip-unmapped file1.bed file2.bed > answer.bed
ADD COMMENT

Login before adding your answer.

Traffic: 2021 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6