Depleting CTCF sites from interval file
2
0
Entering edit mode
7.6 years ago
rbronste ▴ 420

Hi,

Wondering the most efficient way to remove CTCF sites from a BED file? Thanks.

Rob.

ChIP-Seq • 1.6k views
ADD COMMENT
3
Entering edit mode
7.6 years ago
mmmmcandrew ▴ 200

bedTools intersect can probably get the job done. You can take your regions.bed file and a separate bed file containing CTCF sites, then use the -v option to output only regions that are not CTCF sites like this:

bedtools intersect -v -a regions.bed -b CTCF.bed > regions_noCTCF.bed

The default is to remove any regions which have even a single base pair of overlap with the B file, but you can change that so that a certain amount of overlap is required for removal.

ADD COMMENT
2
Entering edit mode
7.6 years ago

With BEDOPS bedops:

$ bedops --not-element-of -1 regions.bed CTCF.bed > regionsWithoutCTCFOverlaps.bed

Using --not-element-of preserves the original intervals in regions.bed and any additional columns they have (ID, score, strand, etc.).

If you actually wanted to carve out the space taken up by CTCF intervals, you could use --difference:

$ bedops --difference regions.bed CTCF.bed > answer.bed

This calculates new intervals, discarding additional columns in regions.bed.

ADD COMMENT
0
Entering edit mode

If I understand correctly the first option leaves me with a file where the CTCF sites are identified in the BED, and the second option totally drops them out of the BED record? Thanks again!

ADD REPLY
1
Entering edit mode

The first option removes any elements that overlap CTCF sites by one or more bases. The second option removes the genomic space within elements, which is occupied by the genomic space of CTCF sites. The cartoons in the BEDOPS docs explain this graphically.

ADD REPLY
0
Entering edit mode

Very helpful info thank you. In the second case once the genomic space is removed does the interval get split into two if the CTCF site isn't on one end of the other? Juts trying to see a signature of this in the number of intervals at the end.

ADD REPLY
0
Entering edit mode

Yes, you'd get two or more pieces. It's like painting a wall and pulling away pieces of masking tape from within the middle of the wall, if that analogy is useful.

However, an easier tool to use for that would be bedmap:

$ bedmap --echo --fraction-map 1 regions.bed CTCF.bed > regionsThatEntirelyContainCTCFSites.bed

Then run wc -l on regionsThatEntirelyContainCTCFSites.bed and regions.bed to get counts. This would give an accurate account of relative, full CTCF occupancy.

ADD REPLY

Login before adding your answer.

Traffic: 1661 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6