Hello! I am actually trying to overlap my peaks from ChIp-seq to my differentially expressed genes obtained after RNA-seq analysis. I will try to be more clear about it...
I have my ChiP-seq peaks for my protein of interest X. I used HOMER to annotate them. Let's say I obtained that 30% of my peaks are enriched at promoter regions. Then I have my RNA-seq data in conditions wt versus X-knockout. I used DESeq2 package from R to obtain a list of differentially expressed genes. My question is now to see whether my ChIp-seq peaks for promoters overlap with my list of differentially expressed genes, i.e. my protein X is effectively binding and regulating the expression of these genes. I would like to know if there is some tool able to allow this also at a statistical level. Of course, even a tool to directly overlap ChIp-seq data with RNA-seq would be great :)
You can use a Fisher's test (fisher.test() in R) for the statistics.
Regarding "overlapping" data, it depends on what you mean. I would personally make a combined heatmap of the ChIP and RNAseq data (at least for the DE genes). You can use deepTools for this, though it'd be easiest if you used the develop branch from github, since the computeMatrixOperations command won't otherwise be available until the next release (ETA November 1). The general steps would be:
Use bamCoverage to generate bigWig files (possibly input-normalized in the case of ChIPseq)
Use computeMatrix on the ChIPseq bigWig files, likely with reference-point and a reasonable setting for -b
Use computeMatrix scale-regions on the RNAseq bigWig files, likely using the --metagene option.
Use computeMatrixOperations cbind with the output of 2 and 3
Make a heatmap with plotHeatmap.
This allows you to see the differences even in cases where there happened to not be a peak called.
Hello Devon. I followed the steps from 1-3. Now, I'm stuck at step 4. Below is the command I ran :
computeMatrixOperations cbind -m peak_sorted_matrix rna_16hr_sorted_matrix -o output.mat.gz
Error :
Traceback (most recent call last):
File "/home/anupriya/.local/bin/computeMatrixOperations", line 11, in <module>
main(args)
File "/home/anupriya/.local/lib/python2.7/site-packages/deeptools/computeMatrixOperations.py", line 677, in main
cbindMatrices(hm, args)
File "/home/anupriya/.local/lib/python2.7/site-packages/deeptools/computeMatrixOperations.py", line 408, in cbindMatrices
hm.matrix.matrix = np.hstack((hm.matrix.matrix, np.empty(hm2.matrix.matrix.shape)))
File "/home/anupriya/miniconda2/lib/python2.7/site-packages/numpy/core/shape_base.py", line 288, in hstack
return _nx.concatenate(arrs, 1)
ValueError: all the input array dimensions except for the concatenation axis must match exactly
I used same '-a' and '-b' options in computeMatrix command for ChIP-seq and RNA-seq, still got this error. How can I fix this?
Both BED files need to be of the same length and sorted such that row N in each file correspond to each other (computeMatrixOperations is just merging them by rows, since it has no way to otherwise determine which rows belong together).
Ensure that computeMatrix is keeping the input file order (--sortRegions keep).
But what if I have different/extra row in exon bed file (like this one : Chromosome 10072 10148)? Should I discard them? Won't I'll be losing data then?
FYI, I am using exon file with RNA-seq data and transcript file with Chip-seq data.
They don't cover the whole plot because the two datasets are different size. In general using --perGroup with a dataset like that doesn't make sense, as the columns of data for each sample aren't comparable (only the rows are).
Yes, though it looks like you have an older version of deepTools, since I think I fixed the issue with the tick labels not being correct in more recent versions.
Sorry for the naive question, but when doing computeMatrix on the ChIPseq file, what do you use for -R. I keep getting this error:
I'm guessing it wants the bed file with the peaks - but isn't the point of this approach to not use the peaks as that is limiting?
Thanks for your help.
We usually use transcripts.
by that you mean like a GTF file you use for RNAseq analysis?
GTF or BED, yes
Hello Devon. I followed the steps from 1-3. Now, I'm stuck at step 4. Below is the command I ran :
I used same '-a' and '-b' options in computeMatrix command for ChIP-seq and RNA-seq, still got this error. How can I fix this?
It appears you used a different GTF or BED file to produce the two matrices. Can you post the commands you used to create both?
Hi Devon , below are the commands and the bed files I used :
You'll have to ensure that you do the following:
computeMatrixOperations
is just merging them by rows, since it has no way to otherwise determine which rows belong together).computeMatrix
is keeping the input file order (--sortRegions keep
).Hi Devon,
But what if I have different/extra row in exon bed file (like this one : Chromosome 10072 10148)? Should I discard them? Won't I'll be losing data then?
FYI, I am using exon file with RNA-seq data and transcript file with Chip-seq data.
It's unclear what should be matched together if you have extra rows. In that case you must necessarily lose data (not that a few rows matter).
Hi Devon, I took common rows between exon.bed and transcripts.bed and ran remaining commands :
and got this plot. Why peaks are not covering the whole plot , did I miss something? https://www.dropbox.com/s/ls6742nyqblt8k9/trial.pdf?dl=0
They don't cover the whole plot because the two datasets are different size. In general using
--perGroup
with a dataset like that doesn't make sense, as the columns of data for each sample aren't comparable (only the rows are).Hi Devon, I was wondering, will using either reference-point or scale-regions in computeMatrix for both chipseq n rnaseq data will work?
Or I'll remove --perGroup option and create the graph like this : https://www.dropbox.com/s/gw0rc9jgsq09e69/trial1.pdf?dl=0 and then it can be compared?
Yes, though it looks like you have an older version of deepTools, since I think I fixed the issue with the tick labels not being correct in more recent versions.
Thanks a lot Devon for solving the problem!