MaSC: Mappability-Sensitive Cross-Correlation for Estimating Fragment Length
Published article describing tool: http://bioinformatics.oxfordjournals.org/content/29/4/444
Authors' website for obtaining code: http://www.perkinslab.ca/Software.html
The problem that this tool is addressing:
In ChIP-seq analysis the crucial step is 'peak calling' that identifies areas of the genome where sequence reads from the chromatin-immunoprecipitation (ChIP) pulldown are enriched compared to a control (typically genomic DNA that is not enriched by an antibody). The problem is that sequencers typically only read the start or end of input DNA fragments, which means the true extent of DNA bound by the protein of interest is not known. This is important when trying to determine regions of the genome that are bound in the experiment.
The majority of peak callers (and there are many, see Micsinai et al. 2012 for a recent review) either try to estimate the average length of fragments submitted to the sequencer, or require the user to directly input the value. The peak-caller then extends the reads (by various methods) to simulate the DNA fragments enriched by the ChIP process. The difference in predicted fragment length can produce different and poorly overlapping sets of peaks for the same sample. MACS 1.4.2 (as an example) sometimes predicts the mean fragment length to be only slightly longer than the read length, or it gives up and sets the length as 200bp.
With MaSC Ramachandran et al. (Perkins lab, OHRI) have identified "that the mappability of different parts of the genome can introduce an artificial bias" when calculating fragment length.
A useful description of "mappability" can be found here.
Results:
MaSC produces believable results in my hands and I shall certainly be running it in the future to compare its prediction of fragment length to that from a peak caller. The major caveat is that mappability data is required for this tool to work.
Notes on running MaSC:
MaSC is a simple to use Perl script, so should be easy to run on most systems. I personally had to download the Perl module 'GDGraph', which appeared to be distinct from 'GD' that I already had installed.
The usefulness of MaSC really depends on whether mappability data is available for your genome. The easiest way to find out is to go to the UCSC Genome Browser and find the relevant genome. Mappability (spelt mapability by the UCSC browser team) can be found in the 'Mapping and Sequencing Tracks' section of the browser track settings. If you click on the heading you will be taken to the 'Mapability Track settings', you can then select 'Downloads' and select the mappability data that matches the length of sequence reads used in your experiment, e.g. 36bp.
After downloading the bigWig file you will need to convert it in to WIG format, moreover as one file per chromosome. Thankfully you can download a script from UCSC called 'bigWigToWig'. The command line looks something like this:
bigWigToWig -chrom=chr1 wgEncodeCrgMapabilityAlign36mer_mm9.bigWig mm9_36mer_chr1.map
bigWigToWig -chrom=chr2 wgEncodeCrgMapabilityAlign36mer_mm9.bigWig mm9_36mer_chr2.map
EDIT: MaSC will now automatically convert the values in the WIG files obtained using the previous step!!! I have left the text for interest sake.
However, the wig values are continuous and MaSC requires a binary 0 or 1 value, so any value less than 1 need to be set to 0. But here is a handy AWK command to do this:
awk 'OFS="\t" { temp = $4 }; { if ( temp <1 ) $4=0 }; { print }' < mm9_36mer_chr1.map > mm9_36mer_binval_chr1.map
awk 'OFS="\t" { temp = $4 }; { if ( temp <1 ) $4=0 }; { print }' < mm9_36mer_chr2.map > mm9_36mer_binval_chr2.map
All these files can be deposited in a directory that MaSC can read.
MaSC also requires the length of all the chromosomes, but again there is a handy tool from UCSC to do this 'fetchChromSizes.sh'.
The tool takes sequence reads in a variety of formats.
What did you mean by
"However, the wig values are continuous and MaSC requires a binary 0 or 1 value, so any value less than 1 need to be set to 0."