Quality control of Chip-seq data (NRF and PCB)
0
0
Entering edit mode
21 months ago

Hi, I'm checking my ChIP-seq library, and I'm following ENCODE guidelines but I can't figure out whether NRF and PCB analysis should be done before or after removing PCR duplicates or before. I know this is probably a stupid question but I just can't figure it out.

Thank you

control library quality PCB Chip-seq NRF • 1.5k views
ADD COMMENT
0
Entering edit mode

If you remove the duplicates then the non-redundant/non-duplicated fraction will be 1.0. But if you save the log file during the de-duplication then you will get the number of replicates.

ADD REPLY
0
Entering edit mode

Can you explain me better? Because for the NRF I do :

a) samtools view -c .bam b)samtools view -c -F 260 .bam

then b/a

and as a bam file I use the one that I deduplicate

ADD REPLY
0
Entering edit mode

If you use samtools markdup then samtools markdup -r -f foo.log foo.bam foo.dedup.bam should give you deduplicated BAM and duplication stats. Does it answer your question?

ADD REPLY
0
Entering edit mode

Not really, because I already have deduplicate my bam file so I already know the number of deduplicate reads. I have a bam file that is without the deduplication (my final bam) and a bam file that keep it. My question is how calculate NRF now that I have the two files.

ADD REPLY
0
Entering edit mode

Any resolution? im struggling with the same problems for obtaining the NRF if it works , I used this for calculating PBC.

PBC <- function(IP) {

require(GenomicAlignments)
require(data.table)

# load ChIP sample if necessary
if (is.character(IP)) {
    if (!file.exists(IP))
        stop(paste("File", IP, "does NOT exist."))
    else
        aln <- readGAlignments(IP)
} else if (class(IP) == "GAlignments") {
    aln <- IP
} else {
    stop("IP must be a file path or a GAlignments object.")
}

# convert GAlignments object to data.table for fast aggregation
aln <- data.table(
    strand=as.factor(BiocGenerics::as.vector(strand(aln))),
    seqnames=as.factor(BiocGenerics::as.vector(seqnames(aln))),
    pos=ifelse(strand(aln) == "+", start(aln), end(aln))
)

# aggregate reads by position and count them
readsPerPosition <- aln[,list(count=.N), by=list(strand, seqnames, pos)]$count

# PBC = positions with exactly 1 read / positions with at least 1 read
PBC <- sum(readsPerPosition == 1) / length(readsPerPosition)

return(PBC)

} an then use this... PBC(your-bam.bam)

ADD REPLY

Login before adding your answer.

Traffic: 2066 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6