Tool:Bedtools: Analyzing Genomic Features
10
20
Entering edit mode
12.6 years ago

All practicing bioinformaticians will face problems that require them to compare, query and select genomic features across an entire genome. As it happens efficient interval representation and query is a surprisingly challenging problem that needs a specialized representation.

The BEDTools suite contains a set of programs that support a broad range of interval analyses that involve selecting certain locations in the genome. The name reflects the original intent to process BED files but the tools operate just as well on GFF formats. The scripts need to be run in command line format and are available for UNIX type systems: Linux, Mac OSX, and Cygwin (on Windows).

The link to the site is: http://code.google.com/p/bedtools/

With BEDTools one can answer questions such as:

  • how many reads map upstream/downstream of one or more locations in the genome?
  • how many reads cover a certain base in the genome?
  • which sections of the genome are not overlapping with target intervals?
  • what are the sequences specified by the coordinates?
  • ...

The suite consists of multiple tools but for beginners the most important is intersectBed. Understanding this tool is a gateway to understanding them all. In fact many (but not all) of the other tools slopBed, windowBed are simply convenience tools that assist users preparing/formatting output a certain way and could be replaced by small custom scripts.

Note: a very large number of problems can be solved via running nothing more than the various scripts in BEDTools and occasional reformatting of the outputs. If you are new to the field take your time and learn what BEDTools does.

bedtools • 11k views
ADD COMMENT
10
Entering edit mode
11.8 years ago

Just an FYI to those not on the bedtools1 mailing list2. We are close to completing a new documentation site that is already more up to date than the existing PDF. Comments and suggestions welcome as always.

bedtools.readthedocs.org/en/latest/

In particular, the genomecov, map, and cluster utilities have (finally) been properly documented.

ADD COMMENT
0
Entering edit mode

Can you post the bedtools recipe for annotation of intervals by features such as TSS CDS Exons 5' UTR Exons 3' UTR Exons CpG Islands Repeats Introns Intergenic

ADD REPLY
4
Entering edit mode
12.4 years ago
enricoferrero ▴ 910

Just came here to add something which is not in the documentation but I need to do quite often.

How to join/merge 2 or more BED files with BEDtools:

cat file1.bed file2.bed [fileN.bed] | sortBed -i stdin | mergeBed -i stdin > merged.bed
ADD COMMENT
4
Entering edit mode

You might check out BEDOPS:

bedops -m file1.bed file2.bed ... fileN.bed > merged.bed

assuming your input files are sorted, then the output will be too (useful for further downstream analyses). Gets you out of cat'ing everything together and doing a sort on a larger file. The bedops program is designed from the ground up to work efficiently with any number of input files at once.

ADD REPLY
4
Entering edit mode
10.8 years ago

We just posted an assessment of bedtools' performance with sorted and unsorted data as a function of dataset size:

http://bedtools.readthedocs.org/en/latest/#performance

ADD COMMENT
3
Entering edit mode
9.9 years ago

Bedtools version 2.22.1 is out. Details below. Importantly, the closest tool is 30-80X faster (depending on options) now that it requires sorted input datasets. The closest tool also search for closest features among any number of "B" files. In addition, we have finally written proper docs for the closest tool. In the works for the next release are options to find the k-closest features and options to force the discovery of the closest feature both upstream and downstream.

https://github.com/arq5x/bedtools2/releases/tag/v2.22.1

  • When using -sorted with intersect, map, and closest, bedtools can now detect and warn you when your input datasets employ different chromosome sorting orders.
  • Fixed multiple bugs in the new, faster closest tool. Specifically, the -iu, -id, and -D options were not behaving properly with the new "sweeping" algorithm that was implemented for the 2.22.0 release. Many thanks to Sol Katzman for reporting these issues and for providing a detailed analysis and example files.
  • We FINALLY wrote proper documentation for the closest tool. http://bedtools.readthedocs.org/en/latest/content/tools/closest.html
  • Fixed bug in the tag tool when using -intervals, -names, or -scores. Thanks to Yarden Katz for reporting this.
  • Fixed issues with chromosome boundaries in the slop tool when using negative distances. Thanks to @acdaugherty!
  • Multiple improvements to the fisher tool. Added a -m option to the fisher tool to merge overlapping intervals prior to comparing overlaps between two input files. Thanks to brentp
  • Fixed a bug in makewindows tool requiring the use of -b with -s.
  • Fixed a bug in intersect that prevented -split from detecting complete overlaps with -f 1. Thanks to @tleonardi.
  • Restored the default decimal precision to the groupby tool.
  • Added the -prec option to the merge and map tools to specific the decimal precision of the output.
ADD COMMENT
2
Entering edit mode
11.0 years ago

I added a brief tutorial to introduce beginners to bedtools.

http://quinlanlab.org/tutorials/cshl2013/bedtools.html

ADD COMMENT
0
Entering edit mode

bedtools complement could have feature to extract introns from exons.bed - intervals in-between bed lines (at the moment I am using this solution).

ADD REPLY
2
Entering edit mode
10.7 years ago

We just released version 2.19.1. This fixes a silly bug in 'intersect', and allows one to apply multiple operations/columns with the map tool in a single run.

$ bedtools map -a a.bed -b b.bed -c 5,5,5,5 -o min,max,median,collapse

Or:

$ bedtools map -a a.bed -b b.bed -c 3,4,5,6 -o mean

We have also refactored the code for computing operations on the overlapping columns and ths has resulted in a speedup over previous releases and other methods.

Commands used for plot below:

runit bedtools-2.18.0 map -a ccds.exons.bed -b sample.10M.bam.bed -c 1 -o count > /dev/null
runit bedtools-2.19.0 map -a ccds.exons.bed -b sample.10M.bam.bed -c 1 -o count > /dev/null
runit bedtools-2.19.1 map -a ccds.exons.bed -b sample.10M.bam.bed -c 1 -o count > /dev/null
runit bedmap --count --echo --bp-ovr 1 ccds.exons.bed sample.10M.bam.bedmap.bed > /dev/null

# not shown (time = 21.15 seconds)
runit bedmap --count --ec --bp-ovr 1 ccds.exons.bed sample.10M.bam.bedmap.bed > /dev/null

Speed comparison

ADD COMMENT
2
Entering edit mode
10.2 years ago

We just released verion 2.21.0.

There are three highlights. First, the intersect tool can now intersect more than two files. An example of this can be found here. Secondly, the intersect tool should be up to 2 times faster when using sorted data for certain use cases owing to an enhancement to the core algorithm. Third, Brent Pedersen has contributed the "fisher" tool which conducts a Fisher's exact test assessing the significance of the overlaps between two interval files.

Release details: http://bedtools.readthedocs.org/en/latest/content/history.html

ADD COMMENT
1
Entering edit mode
11.0 years ago

We just released version 2.18.0 which is much faster for sorted data, includes new tools and features, and allows greater flexibility with chromosome naming and sorting. Details here.

Importantly, Google Code is being shut down by Google. As such, all releases and code will be maintained on Github. The repository is here

Thanks for your patience and for the continued use of bedtools.

ADD COMMENT
1
Entering edit mode
10.8 years ago

We just released version 2.19.0, which addresses a couple important bugs, reduces memory, and confers 3X speedup to the map tool. In addition, the map tool supports the -split option as well as alternative chromosome ordering schemes (i.e., beside lexicographic).

Details: https://groups.google.com/forum/#!topic/bedtools-discuss/UJpo5JJO38M

Releases: https://github.com/arq5x/bedtools2/releases

ADD COMMENT
1
Entering edit mode
10.5 years ago

We just released verion 2.20.0 and 2.20.1. Release details: http://bedtools.readthedocs.org/en/latest/content/history.html

Download: https://github.com/arq5x/bedtools2/releases/tag/v2.20.1

ADD COMMENT

Login before adding your answer.

Traffic: 1861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6