Question

Single-end control & Paired-end sample in Chip-seq

0

Entering edit mode

7.5 years ago

anu014 ▴ 190

Hello Biostars!

I've been doing Chip-seq analysis for few months now. I decided to use MACS2 for the analysis. It's a very nice tool to deal with any kind of data. But recently I've encountered with a new problem.

I got control which is single-end & treatment sample which is paired-end. In macs2 we give control & sample together for differential peak calling & we've to tell if the files are in BAMPE or BAM format. But for this situation this won't work.

So I searched for this problem in biostars itself & end up getting 2 posts - Combination Of Paired-End And Single-End Samples In Chip-Seq Tf Study Paired end and single end with MACS2

As stated by @Istvan Albert I tried merging both paired end files with fastq-join. But out of 33631672 reads only 269489 were joined. So I tried metagenomics tool ,QIIME's strategy of converting paired-end to single end which is convert reverse strand to forward strand ,by doing reverse complement of reverse read, and just doing 'cat' on forward & reverse complement files to get single file. (Note: these files were converted from FASTQ to FASTA in this process). This strategy also didn't work out and I end up getting a linear peak model.

In the second post (Paired end and single end with MACS2) it was mentioned we can just take forward read & do the analysis, which is contradictory to the first post (https://www.biostars.org/p/87138/). Can I follow this strategy ?

Please help me out. Thank you in advance :)

ChIP-Seq macs peakcalling • 5.1k views

ADD COMMENT • link updated 7.5 years ago by Devon Ryan 104k • written 7.5 years ago by anu014 ▴ 190

1

Entering edit mode

As Devon suggested, I always consider R1 and ignore R2 if a sample is paired-end and rest of the samples are single-ended. This should not be a major problem in calling peaks or quantifying the peaks as far as I know.

ADD REPLY • link 7.5 years ago by GouthamAtla 12k

score 3 · Answer 1 · 2017-06-02

3

Entering edit mode

7.5 years ago

Devon Ryan 104k

Just use read 1 and ignore read 2. This is preferred over trying to merge reads, which is failing in your case anyway.

ADD COMMENT • link 7.5 years ago by Devon Ryan 104k

1

Entering edit mode

Thank you Devon for replying.

I want your suggestion on this comment on post Paired end and single end with MACS2 : "run both files using --format BAM, MACS2 should just use the 5' read of each pair in the PE file. As a SE analysis the fragment length will be calculated by cross correlation, however if you run MACS2 with the PE ChIP on its own then you can get the value 'd' from the log info. Then, when you run the SE analysis with the control you can use --nomodel --extsize 'd' (as its integer value'."

Is it a right way to do? I don't know much details of macs..

ADD REPLY • link 7.5 years ago by anu014 ▴ 190

2

Entering edit mode

I haven't a clue how MACS2 will treat PE alignments if you treat them as SE, so I can't say. It would seem safer to use only read #1 and plug in the known fragment size from the PE data.

ADD REPLY • link 7.5 years ago by Devon Ryan 104k

1

Entering edit mode

Hi Devon, As you suggested, I called peaks using only R1 of the data. It ran successfully for one group with below command :

Command line: callpeak -t /home/DRUG_sorted.bam -c /home/Input_sorted.bam -f BAM -g 4331233 -n /home/DRUG_Input_broad -B --mfold 1 60 --broad --broad-cutoff 0.1 --bw 300 --buffer-size 100000 --qvalue 0.05 --slocal 1000 --llocal 10000 --cutoff-analysis --fe-cutoff 1.0 --nomodel --extsize 277

But it's not working for another group:

Command line: callpeak -t /home/UNTREATED_sorted.bam -c /home/Input_sorted.bam -f BAM -g 4331233 -n /home/UNTREATED_Input_broad -B --mfold 1 60 --broad --broad-cutoff 0.1 --bw 300 --buffer-size 100000 --qvalue 0.05 --slocal 1000 --llocal 10000 --cutoff-analysis --fe-cutoff 1.0 --nomodel --extsize 351

I've tried changing parameters like below : i. tried other extsize too as shown in peak model ii. tried to increase qvalue & broad-cutoff to 0.9 iii. decreased fe-cutoff to 0.1

But nothing seems to be working. How can I get peaks for this combination now?

ADD REPLY • link 7.5 years ago by anu014 ▴ 190

1

Entering edit mode

"not working" as in you get no peaks or "not working" as in you get an error? Have you run a fingerprint (e.g., with plotFingerprint) to judge enrichment of your IP over input?

ADD REPLY • link 7.5 years ago by Devon Ryan 104k

1

Entering edit mode

No files were created except the cutoff_analysis file which was empty with the command. Just now I removed this parameter "--cutoff-analysis" & all the files have been created. But all these files are empty except bedgraph files. The xls file looks like this :

# This file is generated by MACS version 2.1.1.20160309
     # Command line: callpeak -t /home/UT_sorted.bam
     -c /home/Input_sorted.bam
     -f BAM -g 4331233 -n /home/UT_Input_broad
     -B --mfold 1 60 --broad --broad-cutoff 0.1 --bw 300 --buffer-size 100000 --qvalue 0.05 --slocal 1000 --llocal 10000 --fe-cutoff 1.0
     --nomodel --extsize 351
     # ARGUMENTS LIST:
     # name = /home/UT_Input_broad
     # format = BAM
     # ChIP-seq file = ['/home/UT_sorted.bam']
     # control file = ['/home/Input_sorted.bam']
     # effective genome size = 4.33e+06
     # band width = 300
     # model fold = [1, 60]
     # qvalue cutoff for narrow/strong regions = 5.00e-02
     # qvalue cutoff for broad/weak regions = 1.00e-01
     # Larger dataset will be scaled towards smaller dataset.
     # Range for calculating regional lambda is: 1000 bps and 10000 bps
     # Broad region calling is on
     # Paired-End mode is off

     # tag size is determined as 74 bps
     # total tags in treatment: 26792544
     # tags after filtering in treatment: 6980031
     # maximum duplicate tags at the same position in treatment = 1
     # Redundant rate in treatment: 0.74
     # total tags in control: 34417538
     # tags after filtering in control: 8284512
     # maximum duplicate tags at the same position in control = 1
     # Redundant rate in control: 0.76
     # d = 351 chr  start   end length  pileup  -log10(pvalue)  fold_enrichment -log10(qvalue)  name

ADD REPLY • link 7.5 years ago by anu014 ▴ 190

1

Entering edit mode

I ran plotFingerprint on the data. The distance between input n sample is very less & the input line is lying below sample.. I guess this is the problem. I wish I could upload the plot here.

ADD REPLY • link 7.5 years ago by anu014 ▴ 190