Methods to subset sequenced experiments to account for differences in read counts?
1
0
Entering edit mode
10.4 years ago

Hi All,

I'm looking for advice and also to know what people usually do to account for differences between different sequenced experiments. For example if you have 2 ChIP-seq/RNA-seq/... experiments, one with 60M reads and the other with 40M reads, how do you obtain:

  1. BAM/SAM/BED file with the same number of reads?
  2. A normalized WIGGLE/BEDGRAPH/BIGWIG signal track (i.e. RPM units)?

For the first question I searched a bit and found that "samtools view" have the option "-s" which you could indicate the fraction of templates to subsample from the raw mapped file, like this:

samtools view -bh -s 0.66 bam.file > subsampled.bam.file

For the second question I was thinking to take a wiggle file and divide values at each bp by total number of reads and then multiply by 1e6, like this:

awk '{if($1 !~ /^fixed/){OFS="\t"; printf "%0.2f\n",($1/60000000)*1e6}else{print $0}}' raw.fixedStep.wig.file > normalized.wiggle.file

Do you think this methods are good to account for read differences and to normalize signal tracks?

Thanks!

genome sequencing down-sampling • 2.6k views
ADD COMMENT
0
Entering edit mode

For #1, why do you need that? It would be easier to use a library normalization and then weight samples accordingly in the followup analysis (this is how most RNAseq analysis packages work).

ADD REPLY
0
Entering edit mode
10.4 years ago

Have you tried DownsampleSAM from Picard? And it would be not very good idea to "normalize" a bed signal track, you could easily shift some of your peaks below stochastic threshold (1/coverage). I would rather recommend to downsample SAM and then re-run your analysis. In case there is a big difference between initial size and down-sampled size, run down-sampling several times to create replicates. In this case don't forget to change seed (RANDOM_SEED parameter)

ADD COMMENT

Login before adding your answer.

Traffic: 2406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6