Normalising ChIP-seq data to RPM using a 1 or 2 count for paired-end data?
3
1
Entering edit mode
9.8 years ago
James Ashmore ★ 3.5k

I want to normalise my paired-end ChIP-seq data (4 samples treated, 4 samples control) to reads per million (RPM). So far my calculation takes the number of reads mapped to a position, divides it by the total number of mapped reads in the sample, and then multiples this by one million. My question is, should the number of mapped paired-end reads be counted as 1 or 2 in the calculation? I believe Samtools reports the total number of mapped reads as half the actual number of paired-end reads which map?

My current pseudo script for normalising to RPM:

  • mappedReads = samtools view -c -F 4 input.bam
  • scalingFactor = 1000000 / mappedReads
  • bedtools genomecov -ibam input.bam -bg - scale scalingFactor -g chrom.sizes > output.bedGraph
  • bedGraphToBigWig input.bedGraph chrom.sizes output.bigWig
ChIP-Seq normalisation paired-end • 11k views
ADD COMMENT
0
Entering edit mode

For paired end data, rpm=fpm (fragments per million). So, it should be counted as one.

ADD REPLY
0
Entering edit mode

Just to clarify, you are saying that each pair of reads should be counted as just one? So in a perfect world where 100% of my reads align, the number of total alignments would be half the total number of reads in the data?

ADD REPLY
0
Entering edit mode

Yup. Count a pair of reads as one. For single end reads each read represents a fragment it is sequenced from whereas for paired-end, a pair of sequence represents the fragment it is sequenced from. As simple as that.

ADD REPLY
8
Entering edit mode
9.8 years ago
Fidel ★ 2.0k

You can use deepTools to do the normalisation that you want. The output is a bigwig file. It is quite fast if you have multiple cores.

bamCoverage -b file.bam --normalizeUsingRPKM -o file.bw

The program first extends the read to match the paired-end length before computing the coverage. I opted to count paired end reads as 2 and not as 1 to avoid a bias when a read is not properly paired which could be a significant fraction.

ADD COMMENT
0
Entering edit mode

Thank you for the suggestion, I used Bowtie2 to map my reads but indicated to remove discordant and mixed alignments, in this case counting each pair of reads as 1 would be preferred

ADD REPLY
0
Entering edit mode

One option is to filter your bam file to include only the first mate, then use it with bamCompare. If all your reads are properly paired this should work.

ADD REPLY
0
Entering edit mode

I just pushed a commit to github enabling bamCoverage to select reads based on the SAM flag. Now you can do something like this to get what you want:

bamCoverage -b file.bam --normalizeUsingRPKM -o file.bw --samFlag 64

--samFlag 64 means to select only the first mate.

ADD REPLY
0
Entering edit mode
9.8 years ago
Manvendra Singh ★ 2.2k

I think each read should be counted as 1

or

you can try MAnorm for normalization if it suits for your analysis or comparisons

ADD COMMENT
0
Entering edit mode

Initially I just want to create a coverage track for the normalised data.

ADD REPLY
0
Entering edit mode

Okay, I agree with Fidel

ADD REPLY
0
Entering edit mode
9.8 years ago
Ian 6.1k

In case you do not already know MACS2 will generate coverage plots for the non-redundant fragments used in the analysis (by default 1 unique fragment per strand) normalised to millions of reads using the --SPMR flag.

ADD COMMENT
0
Entering edit mode

I think MACS does not use the paired-end data to extend the reads. Does it extend the reads?

ADD REPLY
0
Entering edit mode

MACS2 does not need to extend reads as it knows the start and end of the fragment by using the paired end reads information in the BAM file.

ADD REPLY
0
Entering edit mode

For this you need to use the BAMPE mode of MACS2 which I am not sure it properly works. We tried it and didn't work, also I see numerous complains in the mailing list. Moreover, the bam file has to be sorted by name. I always use MACS2 for peak calling but not coverage tracks.

ADD REPLY

Login before adding your answer.

Traffic: 1950 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6