I just learn to understanding ENCODE ChiP-Seq data for Transcription Factor binding. I looked at the narrowpeak files and find there is a column named "Score". Is this the tag density indicating the binding affinity of TF at this site or region? If not, how can I get the tag density (or binding affinity)?
ENCODE narrowPeak: Narrow (or Point-Source) Peaks format
This format is used to provide called peaks of signal enrichment based on pooled, normalized (interpreted) data. It is a BED6+4 format.
chrom - Name of the chromosome (or contig, scaffold, etc.).
chromStart - The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0.
chromEnd - The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined aschromStart=0, chromEnd=100, and span the bases numbered 0-99.
name - Name given to a region (preferably unique). Use '.' if no name is assigned.
score - Indicates how dark the peak will be displayed in the browser (0-1000). If all scores were '0' when the data were submitted to the DCC, the DCC assigned scores 1-1000 based on signal value. Ideally the average signalValue per base spread is between 100-1000.
strand - +/- to denote strand or orientation (whenever applicable). Use '.' if no orientation is assigned.
signalValue - Measurement of overall (usually, average) enrichment for the region.
pValue - Measurement of statistical significance (-log10). Use -1 if no pValue is assigned.
qValue - Measurement of statistical significance using false discovery rate (-log10). Use -1 if no qValue is assigned.
peak - Point-source called for this peak; 0-based offset from chromStart. Use -1 if no point-source called.
Thanks. So does it mean the signalValue is the tag density? I looked through a few samples, and the values are always integer, is it true?
One more question is how to merge information from replicates? Apparently they always don't have the same regions. What kind of regions from replicates can be treated as the same region/site?
Hi, I somehow missed this. Yes, signalValue is the tag density.
For merging replicates, you can
1) Merge fastq files, if they are technical replicates (not the best)
2) Analyse seperately, and use bedtools intersectBed to find the overlapping regions, either on mapped bed files or significant binding sites (this is much better)
3) Calculate the tagDensity for a specific locus (TSS +/-3KB etc) and now you can compare both samples, as they have same locus, you can merge or average them, but dont forget to normalize by read or sequencing depth.
Thanks for replying. I come back to read your replying again, and have another question. Is it reasonable to calculate the binding difference between two TFs at the same position by subtract the signalValue of one TF from another one? Thanks.
Dear Moderator :
I got question about the narrow peaks format. In general, BED file defined as chromName / chromStart / chromEnd / strand / Name /Score / ..., where score column refers to significance value of peak signal. However, I need to convert score column as p-value ( format of pvalue could be 1 base, 10 based, 100 based) . How can I achieve desired format of peak' p-value while add it as new metadata ? Could you give me possible idea please ? Thanks a lot :)
Thanks. So does it mean the signalValue is the tag density? I looked through a few samples, and the values are always integer, is it true?
One more question is how to merge information from replicates? Apparently they always don't have the same regions. What kind of regions from replicates can be treated as the same region/site?
Thanks.
Hi, I somehow missed this. Yes, signalValue is the tag density.
For merging replicates, you can
1) Merge fastq files, if they are technical replicates (not the best)
2) Analyse seperately, and use bedtools intersectBed to find the overlapping regions, either on mapped bed files or significant binding sites (this is much better)
3) Calculate the tagDensity for a specific locus (TSS +/-3KB etc) and now you can compare both samples, as they have same locus, you can merge or average them, but dont forget to normalize by read or sequencing depth.
Thanks for replying. I come back to read your replying again, and have another question. Is it reasonable to calculate the binding difference between two TFs at the same position by subtract the signalValue of one TF from another one? Thanks.
yeah, its feasible. Better is to define a genomic locus and caluclate area under the curve normalized by the read depth and then compare.
Dear Moderator : I got question about the narrow peaks format. In general, BED file defined as chromName / chromStart / chromEnd / strand / Name /Score / ..., where score column refers to significance value of peak signal. However, I need to convert score column as p-value ( format of pvalue could be 1 base, 10 based, 100 based) . How can I achieve desired format of peak' p-value while add it as new metadata ? Could you give me possible idea please ? Thanks a lot :)
Hi , can you please explain more about the "peak" field?
the position of highest intensity of that marker proein like in case of h3k36me3 the point of highest methylation.