Question

Which Pipeline To Analyze Dnase-Seq For Significant Regulatory-Region Finding?

2

Entering edit mode

11.6 years ago

daniel.soronellas ▴ 330

Dear community,

Currently in our lab we are making an effort in order to find important regulatory regions in breast cancer cells. So, for that purpose we had sequenced DNA I hypersensitive fragments without replicates at different time points of single-ended 18bp each read.

I can summarize that in DNase-seq the workflow goes like:

Lab work/Sequecing
Quality control check/read trimming
Mapping to reference genome (BWA or bowtie)
Identification of regions with signal
Differential signal enrichment between conditions
Motif analysis of TF and Histone marks enrichment
Further specific analysis

The first question is which is your general pipeline? Is it more or less like I show?

Also, I have seen that there not many softwares/pipelines developed to analyze DNase-seq data for signal enrichment:

F-seq (ChIP-seq and Dnase-seq) : http://fureylab.web.unc.edu/software/fseq/
HotSpot (ChIP-seq and Dnase-seq) : http://www.uwencode.org/proj/hotspot-ptih/
MACS (I'm not sure if it's really a good choice, but saw some papers using it) : http://liulab.dfci.harvard.edu/MACS/

I was planning to use F-seq because I found clearly enough to proceed, but as far as I know F-seq doesn't compute any significance test to rank regions. So, at this point how you assess region significance?

Thanks for your support and help!

• 7.6k views

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 11.6 years ago by daniel.soronellas ▴ 330

1

Entering edit mode

a minor point: without replicates hardly any statistical analysis could be done.

ADD REPLY • link 11.4 years ago by Vitis ★ 2.6k

score 2 · Answer 1 · 2013-10-11

2

Entering edit mode

11.1 years ago

dnaseiseq ▴ 220

For (step 7) you might be interested in trying the DNaseR package, which will be released in the upcoming version of Bioconductor. But you will need to have deep coverage...

DNase I footprinting analysis of DNase-seq data

http://bioconductor.org/packages/devel/bioc/html/DNaseR.html

ADD COMMENT • link 11.1 years ago by dnaseiseq ▴ 220

score 1 · Answer 2 · 2013-04-26

1

Entering edit mode

11.6 years ago

Ying W ★ 4.3k

Imo steps 4 & 5 are key and lots of work is still being done on it. I agree that MACS is probably not the best peak caller to use for DNAse-seq, you might find this review from last year helpful, it goes over some different methods http://www.ncbi.nlm.nih.gov/pubmed/23118738

What you could do is identify peaks in each of your samples (step 4) and then overlap them to determine differential binding (step5) and scan those for motifs (step6).

ADD COMMENT • link 11.6 years ago by Ying W ★ 4.3k

0

Entering edit mode

Thanks that's a good point. But how I could assess statistical significance to the regions found? I've seen a paper that what they claim is to fit region scores to a gamma distribution and then try to find which score is indicating a p-value of 0.05...I have tried but for me it's a bit complicated this step. Any suggestions? Thanks again!

ADD REPLY • link 11.6 years ago by daniel.soronellas ▴ 330

1

Entering edit mode

There has been some work done for ChIP signals (DiffBind and RSeg are two) but I don't know of any for DNAseq data. I believe one of the recent ENCODE papers had a custom method to do differential chromatin http://pubmed.gov/22955618 more: http://www.nature.com/encode/threads/chromatin-patterns-at-transcription-factor-binding-sites I also just found a review paper that you might find useful: http://www.ncbi.nlm.nih.gov/pubmed/23118738

ADD REPLY • link 11.6 years ago by Ying W ★ 4.3k

0

Entering edit mode

If DNAse-seq peaks are called by MACS, is there any standard threshold for annotating genes as accessible or not accessible (based on the peak tag density score)?

ADD REPLY • link 8.6 years ago by Bioinformatist Newbie ▴ 270

Ram · Answer 3 · 2013-07-17

1

Entering edit mode

11.4 years ago

jasper1918 ▴ 10

Hi,I just came across this and while it has been awhile Im sure there are many struggling to figure this out. The F-Seq program does in fact do significance testing. Instead of outputing as bed, set to npf (narrowpeak format). It has the traditional 6+3 bed format with pvalues.

ADD COMMENT • link 11.4 years ago by jasper1918 ▴ 10

0

Entering edit mode

Hi jasper1918, Could you provide more information? Which statiscal test is performed in F-seq to calculate the p-values?

ADD REPLY • link 11.1 years ago by dnaseiseq ▴ 220

0

Entering edit mode

Hi jasper1918,

As far as I know those are not p-values but are just assigned "-1" in the pvalue column of the NPF format, since the program does not do significance tests. Could you shed more light on how you obtained the p-values with FSeq?

I tried setting output format as NPF, but I always get "-1" for pvalues.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by thecuriousbiologist ▴ 550

0

Entering edit mode

If DNAse-seq peaks are called by MACS, is there any standard threshold for annotating genes as accessible or not accessible (based on the peak tag density score)?

ADD REPLY • link 8.6 years ago by Bioinformatist Newbie ▴ 270