Hello everyone,
I have recently been using data from the Epigenomics Roadmap Project and am a little confused. I have been downloading the data from http://egg2.wustl.edu/roadmap/web_portal/processed_data.html#ChipSeq_DNaseSeq under section "c. Peak Calling" for NarrowPeak data. The data will have the following format:
chrom chromStart chromEnd rank score strand signal_value pvalue qvalue peak
chr20 30298908 30300550 Rank_1 423 . 11.54602 42.39392 34.03034 1197
As you can see the "strand" column is left blank. What I am trying to do is check whether peaks in different cell lines and histone marks map to regulatory regions within the genome. I often will use the 2000 bp and 500 bp downstream criteria when defining a promoter region, but taking strand into account and with the way UCSC defines their txstart and txend fields, for negative strand genes, I have to go 2000 bp downstream the TSS rather than upstream. This brings up the question as to whether these peaks are being mapped on the positive or the negative strand, or does it not matter? Are the coordinates such that it is the positive strand orientation?
Thanks you for your help
If DNAse-seq peaks are called by MACS, is there any standard threshold for annotating genes as accessible or not accessible (based on the peak tag density score)?
There is no standard threshold. You need to set it as long as it makes biological sense.
Can you point me out towards a publication mentioning the reason behind the threshold they selected?