Chip-Seq Normalization
1
2
Entering edit mode
13.3 years ago
Repineme ▴ 120

Hi,

I have sequenced human ChIP-Seq samples from 2 different experiments using Illumina. The number of reads are not equivalent between the 2 samples (Heart ChIP-Seq= 2million tags and Kidney ChIP-Seq= 10 million) and I have no replicates.

When ever I try to plot raw reads around promoters I'm failing (one flat line on top and another on bottom) because of the difference in number of reads. Does any one know what is the BEST way to deal this ?

I tried this [not successful ]

position_cDNAnorm = (position_cDNA / sum_cDNA) * average_sum_cDNA

  • position_cDNAnorm = normalised cDNA value for specific position and specific DBP
  • position_cDNA = cDNA value for specific position and specific DBP
  • sum_cDNA = total cDNA count for specific DBP
  • average_sum_cDNA = average of total cDNA counts of all DBPs DBP= DNA Bindign Protein (Transcription factor)
data chip-seq • 12k views
ADD COMMENT
0
Entering edit mode

What if you get signal (bedgraph) lets say from macs when you run with option -B

then calculate the average tag count in the same window of both samples and then divide by number of total reads mapped in million.

ADD REPLY
7
Entering edit mode
13.3 years ago
seidel 11k

Instead of plotting raw reads, plot the rate at which reads are observed in a given location. It sounds odd expressed that way, but basically what you want to observe is reads per million per nucleotide (RPM). However, since nucleotide resolution is pretty extreme, people usually pick a larger bin, say 25 nucleotides, and then you calculate the number of reads that fall into that bin divided by the number of reads in the sample data set, then multiply by 10^6 to get per million. In this way you get an RPM track of 25 base bins covering the genome, thus samples with different numbers of reads become comparable. If your data is in the form of a vector representing coverage, this is especially easy to do in R.

There's a good description of both your issues: identification of enriched regions at promoters, and quantile normalization of reads in the supplemental portion of the following two papers from the Young Lab: Rahl et al. (2010) Cell and Bilodeau et al. (2010) Genes Dev.

ADD COMMENT
1
Entering edit mode

I'm not sure what you mean by "didn't work". Either a region has coverage, or it doesn't. If it doesn't have coverage - there is no way to get coverage besides doing more sequencing or repeating the experiment. If it does have coverage, then you should be able to visualize it by simply loading the indexed BAM file to UCSC. If you want to normalize that coverage, then you can convert it to something like reads per million for a given bin size, but even then, whether the regions show similar patterns or depth is a matter of experimental determination (as opposed to assumption).

ADD REPLY
0
Entering edit mode

Seems logical. One I get 25bin-chr-sta-end-starnd data1-RPM-coverage dat2-RPM-coverage (3columns). Is there any way to plot them around my own genomic regions (TSS or exon-intron junctions) ?

ADD REPLY
0
Entering edit mode

didn't work this too. produced the same results like my type of normalization

ADD REPLY

Login before adding your answer.

Traffic: 2702 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6