Doubts related to MACS tool
2
2
Entering edit mode
9.0 years ago

Dear Users,

I have few doubts using MACS2

Peakcalling

  1. macs2 callpeak -t chip.bedgraph -c input.bedgraph --outdir Input_test -B --nomodel --SPMR

This step generates a control lambda.bdg file and a treated pileup bdg file.

As per my understanding --SPMR normalizes the dataset to 1M reads . So , if it is normalizes, can we convert this data to bigwig file and then visualize the control and treated sample? Does it make sense?

  1. macs2 bdgcmp -t treated_pileup.bdg -c control_lambda.bdg -m logLR --outdir Ba-HW-bdgcmp -o out.bdg

This command is used to remove background noise between the control and treated set. If this is the case what step (1) does? Is it not normalized in step1? If so what the use of --SPMR.

More to my confusion bdgcmp has --scaling factor as an option. To my understanding this is also used for normalization. Again if I want to use logLR for bdgcmp I have to provide -p parameter., but it says this parameter is applied after normalization of sequencing depth. How should I do? I am unable to understand. Please guide me.

Questions

  1. Which step does normalization, --SPMR or scaling factor. If scaling factor, how to estimate the value?
  2. Which files should I take for comparing the chipped and input data. callpeak generates .bdg for control and treated and after that bdgcmp generates one more .bdg file. which files should be considered for visualization using IGV.
  3. If I want to provide -m logLR during bgdcmp, I have to provide -p also. In order to use -p, the data has to be normalized to sequencing depth?

I am sorry, I am quite new to macs2 so have lot of confusion.

Your inputs will be highly appreciated.

Thank You,
Pinky

ChIP-Seq • 8.2k views
ADD COMMENT
0
Entering edit mode

Thanks Ian. I am not so confident with macs2 version. I have few fundamental doubts.

It will be great if you can clarify them.

  1. In macs2 output file NA_peaks.xlsx, what does pileup means? Does NA_peaks.xlsx gives only the enriched regions in treated sample?

    In manual it says , its the pileup height at peak summit. That means the number of reads aligned to that peak region. Is it so?

  2. what is the use of NA_peaks.narrowPeak file. This file contains the equal number of peaks as generated in NA_peaks.xlsx. So what is the purpose?

  3. In my results I could not generate NAME_negative_peaks.xls file. Below are the parameters which I used for peak calling.

    callpeak -t treated.bed -c control.bed --outdir output -B --nomodel --SPMR -q 0.01
    

Your inputs are highly appreciated.

Thanks
Pinky

ADD REPLY
0
Entering edit mode

Sorry i missed your reply, but Pierre seems to have answered.

ADD REPLY
0
Entering edit mode

Thank You for your explanation.

I would like to go one step back and would like get few more inputs.

  1. You discussed about sample normalization using SPMR option. Is it supposed to be done separately for input and IP or following the below mentioned script takes care of it?

    callpeak -t treated.bed -c control.bed --outdir output -B --nomodel --SPMR -q 0.01
    

    What does the 4th column of *_control_lambda.bdg and *_treat_pileup.bdg means in MACS2 output?

    Ans) Is it the fold enrichment. If so how it is calculated?

  2. My control libarary has ~16M reads and treated has 6M reads. How does it affects in MACS2 pipeline?

    To my understanding, data is scaled as per the smaller library.

  3. How does the control__lambda.bw (bigwig file) different from bam file.

    Ans) Is it that.bw file gives only a portion of the region that is enriched whereas bam gives the complete alignment coverage across the genome.

  4. Which files to be considered for visualization the sorted bam files or the bigwig files.

Your answers will be highly appreciated.

Thanks

ADD REPLY
0
Entering edit mode
9.0 years ago
Ian 6.1k

For point 1 --SPMR is only used in conjunction with --bdg. The counts in the bedGraph file are then normalised based on the millions of reads/fragments in the ChIP sample; after deduplication etc. E.g. 12 fragments / 20 (million fragments in ChIP). you can then convert bedGraphToBigWig using the UCSC tool of the same name.

I don't have experience of bdgcmp.

ADD COMMENT
0
Entering edit mode
9.0 years ago
Alternative ▴ 290

There are two different things: "Sample normalization" and "Noise deduction".

  1. you can normalize any library you sequenced by Reads Per Million. This is what the SPMR is for. When this option is specified, you will have that done on your "treatment" and "Control". You can then transform both to "bigwig" and visualize them

  2. In addition to that, and for visualization purposes and some metadata analysis, you can "subtract" or "divide" or "log2(Treatment/Ctr)" your Treatment over your control. This is to deduct noise. This is what bgdcmp allows you to do. You can do that directly on your bam files, without calling peaks, and this is why you have the option that allows you to do so. For instance, you can calculate the RPM fraction yourself and give it to bdgcmp. For instance, to show signal on your treatment sample, you can decide to show "normalized signal with input subtracted" which means that both treatment and control are normalized to RPM and input is subtracted from the treatment,

  3. -P is given because there are different ways of of normalization. You can normalize by RPM, RPGC ...

  4. NA_peaks.narrowPeak is the same as xls but in narrowPeak format. This is what bioinformatician operate on. More on narrowPeak format is on https://genome.ucsc.edu/FAQ/FAQformat.html#format12

ADD COMMENT
0
Entering edit mode

Dear Pierre,

Thank You for your explanation.

I would like to go one step back and would like get few more inputs.

  1. You discussed about sample normalization using SPMR option. Is it supposed to be done separately for input and IP or following the below mentioned script takes care of it?

    callpeak -t treated.bed -c control.bed --outdir output -B --nomodel --SPMR -q 0.01
    

    What does the 4th column of *_control_lambda.bdg and *_treat_pileup.bdg means in MACS2 output?

    Ans) Is it the fold enrichment. If so how it is calculated?

  2. My control libarary has ~16M reads and treated has 6M reads. How does it affects in MACS2 pipeline?

    To my understanding, data is scaled as per the smaller library.

  3. How does the control__lambda.bw (bigwig file) different from bam file.

    Ans) Is it that .bw file gives only a portion of the region that is enriched whereas bam gives the complete alignment coverage across the genome.

  4. Which files to be considered for visualization the sorted bam files or the bigwig files.

Your answers will be highly appreciated.

Thanks

ADD REPLY
0
Entering edit mode

sorry for the late reply. Here are the answers regarding the different points:

  1. yes, it is supposed to be done separately. When you call peaks with the SPMR option, MACS will generate two bedgraph files (that you can convert to the better bigwig format), one for your treatment and one for your input. Both files will be "SPMR" normalized.

    The 4th colum of ".bdg" files represent the signal, after normalization in your case. Read about it here https://genome.ucsc.edu/goldenpath/help/bedgraph.html

  2. Yes, it is scalled. MACS documentation explains that

  3. bigwig files are signal files. Bam files are alignment files. When loaded into a genome browser (i.e IGV), both are supposed to show the similar trend. Bigwig file though will be smaller (compressed) and eventually contains the normalized score depending on the normalization that you applied. ".bw" files give the view on the whole genome, unless if you generated them on a portion of the genome (some programs allow that, like deeptools, ...)

  4. for visualization, the bigwig files are better since they are normalized (unless you did not apply any normalization), faster to load, faster to exchange. Sometimes though, we do look at the bam files too. It depends on what you want to look at.

Best and hope this will help,

Pierre

ADD REPLY
0
Entering edit mode

Hi Pierre,

Thanks very much for your detail explanation. For point 3) -P is given because there are different ways of normalization. You can normalize by RPM, RPGC ...", what should be set for the -P if I want to do the normalization by RPGC?

Kylie

ADD REPLY

Login before adding your answer.

Traffic: 1707 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6