Which Pipeline To Analyze Dnase-Seq For Significant Regulatory-Region Finding?
3
2
Entering edit mode
11.6 years ago

Dear community,

Currently in our lab we are making an effort in order to find important regulatory regions in breast cancer cells. So, for that purpose we had sequenced DNA I hypersensitive fragments without replicates at different time points of single-ended 18bp each read.

I can summarize that in DNase-seq the workflow goes like:

  1. Lab work/Sequecing
  2. Quality control check/read trimming
  3. Mapping to reference genome (BWA or bowtie)
  4. Identification of regions with signal
  5. Differential signal enrichment between conditions
  6. Motif analysis of TF and Histone marks enrichment
  7. Further specific analysis

The first question is which is your general pipeline? Is it more or less like I show?

Also, I have seen that there not many softwares/pipelines developed to analyze DNase-seq data for signal enrichment:

I was planning to use F-seq because I found clearly enough to proceed, but as far as I know F-seq doesn't compute any significance test to rank regions. So, at this point how you assess region significance?

Thanks for your support and help!

• 7.5k views
ADD COMMENT
1
Entering edit mode

a minor point: without replicates hardly any statistical analysis could be done.

ADD REPLY
2
Entering edit mode
11.1 years ago
dnaseiseq ▴ 220

For (step 7) you might be interested in trying the DNaseR package, which will be released in the upcoming version of Bioconductor. But you will need to have deep coverage...

DNase I footprinting analysis of DNase-seq data

http://bioconductor.org/packages/devel/bioc/html/DNaseR.html

ADD COMMENT
1
Entering edit mode
11.6 years ago
Ying W ★ 4.3k

Imo steps 4 & 5 are key and lots of work is still being done on it. I agree that MACS is probably not the best peak caller to use for DNAse-seq, you might find this review from last year helpful, it goes over some different methods http://www.ncbi.nlm.nih.gov/pubmed/23118738

What you could do is identify peaks in each of your samples (step 4) and then overlap them to determine differential binding (step5) and scan those for motifs (step6).

ADD COMMENT
0
Entering edit mode

Thanks that's a good point. But how I could assess statistical significance to the regions found? I've seen a paper that what they claim is to fit region scores to a gamma distribution and then try to find which score is indicating a p-value of 0.05...I have tried but for me it's a bit complicated this step. Any suggestions? Thanks again!

ADD REPLY
1
Entering edit mode

There has been some work done for ChIP signals (DiffBind and RSeg are two) but I don't know of any for DNAseq data. I believe one of the recent ENCODE papers had a custom method to do differential chromatin http://pubmed.gov/22955618 more: http://www.nature.com/encode/threads/chromatin-patterns-at-transcription-factor-binding-sites I also just found a review paper that you might find useful: http://www.ncbi.nlm.nih.gov/pubmed/23118738

ADD REPLY
0
Entering edit mode

If DNAse-seq peaks are called by MACS, is there any standard threshold for annotating genes as accessible or not accessible (based on the peak tag density score)?

ADD REPLY
1
Entering edit mode
11.4 years ago
jasper1918 ▴ 10

Hi,I just came across this and while it has been awhile Im sure there are many struggling to figure this out. The F-Seq program does in fact do significance testing. Instead of outputing as bed, set to npf (narrowpeak format). It has the traditional 6+3 bed format with pvalues.

ADD COMMENT
0
Entering edit mode

Hi jasper1918, Could you provide more information? Which statiscal test is performed in F-seq to calculate the p-values?

ADD REPLY
0
Entering edit mode

Hi jasper1918,

As far as I know those are not p-values but are just assigned "-1" in the pvalue column of the NPF format, since the program does not do significance tests. Could you shed more light on how you obtained the p-values with FSeq?

I tried setting output format as NPF, but I always get "-1" for pvalues.

ADD REPLY
0
Entering edit mode

If DNAse-seq peaks are called by MACS, is there any standard threshold for annotating genes as accessible or not accessible (based on the peak tag density score)?

ADD REPLY

Login before adding your answer.

Traffic: 2742 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6