So I'm trying to understand the --shift and --extsize parameters in MACS2. I have inherited an ATAC-seq that used the following MACS2 command.
macs2 callpeak -t ATAC_sample-1.bam --nomodel --shift -100 --extsize 200 -g 1.5e9 -f BAM
I'm trying to see how I can apply these principles to a Rscript I have that converts my raw BAM file to a coverage file? The full script can be looked at here but the main lines I'm looking at are:
aln <- as(aln, "GRanges") # Converts each read mapping to GRanges coord
aln <- resize(aln, 150) # Extended the reads to the fragment length (4 previous exp)
cov <- coverage(aln) # Get Coverages nucleotide
The definitions for extsize and shift can be seen in the documentation but I'll copy it for reference.
--extsize
While '--nomodel' is set, MACS uses this parameter to extend reads in 5'->3' direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when --nomodel is set or when MACS fails to build model and --fix-bimodal is on.
--shift
Note, this is NOT the legacy --shiftsize option which is replaced by --extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When --nomodel is set, MACS will use this value to move cutting ends (5') then apply --extsize from 5' to 3' direction to extend them to fragments. When this value is negative, ends will be moved toward 3'->5' direction, otherwise 5'->3' direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with --extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can't set values other than 0 if format is BAMPE or BEDPE for paired-end data. Default is 0.
Here are some examples for combining --shift and --extsize:
EXAMPLE 1: To find enriched cutting sites such as some DNAse-Seq datasets. In this case, all 5' ends of sequenced reads should be extended in both direction to smooth the pileup signals. If the wanted smoothing window is 200bps, then use '--nomodel --shift -100 --extsize 200'.
So to summarize my question is how should I resize my raw reads so that the coverage will reflect the --shift and --extsize used by MACS2? From reading the documentation it seems I should shift the reads 100 bp in both directions because it is extending the reads 200 bp 5' -> 3' (--extsze) and back 100 bp with (--shift).
I appreciate your help! The function it self seems easy. It's just herd for me to understand biologically how to write the function (if that makes sense). Based on what you are saying, are you saying that you --shift -100 means shifting read -100 from the start spot?
so lets say I have ranges
--shift -100 turns this into:
Is that correct?
Than --extsize 200 takes the output of that and extends the fragment size to 200
Is that right??
What you showed is how I understand it happening as well :)