You can also use SeqMonk (http://www.bioinformatics.babraham.ac.uk/projects/seqmonk/). It is very easy (they have video tutorials), can be done in windows machine (even with average hardware), and it's very fast. You need to provide BAM/SAM files.
I'll walk through the process of using the BEDOPS-based binning script binReads.sh
to generate a histogram of binned reads visualized on a UCSC Genome Browser instance. These instructions assume human (build hg19
) but just as easily work for assemblies of other organisms.
(1) Download and install the BEDOPS toolkit, which includes bedops
, bedmap
, sort-bed
, conversion scripts and other utilities used in these instructions.
(2) Get the hg19
version of the chromInfo
table from the UCSC Genome Browser.
Visit the UCSC Table Browser. With the All Tables
group selected, for example, select the hg19
database and the chromInfo
table. Output all fields to a text file. (This step can also be performed with Kent-tools' hgsql
commands, if this needs automating.)
(3) Edit this text file (e.g. run awk
on it to put in the start coordinate) and pipe it to sort-bed
to turn it into a sorted BED file. Here's a ready-to-use example for hg19
that I just made: https://dl.dropbox.com/u/31495717/chrList.bed Again, this step can be automated, but it is a file that won't need updating very often.
(4) Bin the BAM-formatted read data. For example, the following makes a 75 bp-windowed read count spaced in 20 bp bins, written to a Starch-formatted archive called result.starch
:
$ binReads.sh myReads.bam $PWD/result.starch 75 20 chrList.bed
You can adjust the size of windows and bins by changing the 75
and 20
parameters, resp.
The Starch file is just a very highly-compressed BED file. We made this format so that we could make the best use of our lab's storage capabilities. You can edit the binReads.sh
script to remove the starch -
call if you don't want the BED data to be compressed, which lets you skip step 4. Otherwise, we go on to the next step:
(5) Extract the binned, compressed result to a BED file:
$ unstarch result.starch > result.bedGraph
(6) Edit the result.bedGraph
file to add the track type. All you need to do is insert track type=bedGraph
on its own line at the top of the file, although you can add various parameters to customize the display and look, etc.
(7) Place the modified result.bedGraph
on a public-facing web site and copy the URL — or otherwise load a local copy — into a UCSC Genome Browser instance via the Custom Track page (Genomes
> manage custom track
). The Genome Browser will recognize it as a bedGraph file and render it accordingly.
That's all there is to it. All these steps can be automated, once you have the process down.
many different file formats but all based on coverage. Any of them could be digested by IGV. If you want some efficiency try choosing a binary data format as it will be faster.
hth, -Abhi
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.