Question

ChIP-seq analysis guidance for a beginner

0

Entering edit mode

5.3 years ago

DM95 • 0

Hello!

I have probably a very simple question, but I need some help exploring my ChIP-data. I have three different sample: 1. untreated 2. treatment with stimulant 3. treatment with inhibitor

First thing I want to know is where do I see increased binding in 2 compared to 1. From this I want to obtain a list of genes where binding of protein of interest is increased in a new file. Then I want to use that file to see where binding is decreased after 3 to find out at which genes binding is decreased after treatment with the inhibitor.

The idea of the experiment is that we treat the cells with a stimulant that induces protein binding and then we follow the stimulant with an inhibitor to see where protein binding is decreased due to the inhibitor and have a list of genes where the protein binding is not disturbed by the inhibitor vs where it is disturbed.

Does anyone have some guidance how to approach this? I am a complete novice when it comes to bioinformatics and I could use some pointers.

Thank you!

ChIP-Seq R Python protein enrichment • 2.8k views

ADD COMMENT • link updated 5.3 years ago by jared.andrews07 ★ 19k • written 5.3 years ago by DM95 • 0

0

Entering edit mode

have you already obtained the sequences ?. I mean the places your protein bind

ADD REPLY • link 5.3 years ago by Antonio R. Franco ★ 5.2k

0

Entering edit mode

I am sorry I should have been more clear. I have been given bigWig files for each of the samples

ADD REPLY • link 5.3 years ago by DM95 • 0

0

Entering edit mode

Do you have experimental replicates?

ADD REPLY • link 5.3 years ago by ATpoint 89k

0

Entering edit mode

Yes, two replicates for each sample

ADD REPLY • link 5.3 years ago by DM95 • 0

score 3 · Answer 1 · 2020-06-30

3

Entering edit mode

5.3 years ago

jared.andrews07 ★ 19k

How many replicates for each condition do you have?

Generally, the ChIP-seq pipeline goes:

Sequencing QC (FastQC)
Alignment (many options here)
Peak calling & peak annotation (MACS2, ZINBA, lots of peak callers out there. chipSeeker is popular for peak annotations, though it takes a simple approach. Some differential binding approaches, like csaw, don't require peaks at all.)
IP QC (ChIPQC)
Differential binding analysis (csaw or DiffBind)
Other downstream analyses & visualizations - motif analyses, enrichment analyses, etc etc.

Here is a workflow with helpful context for each step that uses R packages from Bioconductor to perform an end-to-end analysis. It is likely worth your time even if you don't use all of those packages, as it will explain various steps and what you should expect to see during QC.

ADD COMMENT • link 5.3 years ago by jared.andrews07 ★ 19k

0

Entering edit mode

There are 2 replicates for each conditions. All the files I have are in bigWig format.

ADD REPLY • link 5.3 years ago by DM95 • 0

1

Entering edit mode

Okay, replicates are good.

Then your first step is to get the original data in FASTQ (or at least BAM) format. bigWig files have already been processed and are meant for visualization purposes. They are typically scaled, but are not directly comparable to one another in most cases. This is especially true when you don't know how they've been generated. You will not be able to perform the analyses you want with only those files.

ADD REPLY • link 5.3 years ago by jared.andrews07 ★ 19k

0

Entering edit mode

Ok! Thank you I will work on that. Once I have obtained those files how would I proceed to do the analysis I want? Also, can anything be inferred from bigWig files, like statistics?

ADD REPLY • link 5.3 years ago by DM95 • 0

1

Entering edit mode

Depends. If they're FASTQ files, you'd want to do some QC to ensure the sequencing worked properly. Then you'd move on to the alignment and additional steps as I list above. If they're BAM files, you can also run them through FastQC, but would be able to skip the alignment step, assuming whoever did the alignment used both an aligner and parameters that make sense. You should try to get that information from whoever dealt with the data initially if that's the case. I've edited my answer to include a link to an end-to-end workflow (with code) that should help you get started. It is Bioconductor-centric, but still contains lots of useful info even if you don't use those packages. It also goes through alignment and some QC at the end of the article.

Stats on the bigWigs are a lost cause for the most part. You can look at them in a genome browser IGV just to spot check that your ChIP actually worked. Pick a gene you know should have binding and ensure that you can visually see read pileups in each of your samples.

ADD REPLY • link 5.3 years ago by jared.andrews07 ★ 19k