Hello!
Does anybody dealt with this type of graph "A plot of genome coverage against normalized average depth" http://prntscr.com/df9wmc. I really need it for optimization of procedures of WGS. I was trying to figure it out how to generate it, but I couldn't. The description in methods section was not obvious for me (http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-13-1, Methods, Genome coverage).
"Mapped reads for each replicate data sets were normalized by average genome coverage to 11× and 21× for clinical and non-clinical samples respectively, and then pooled or analyzed independently depending on analysis metrics used. For the clinical samples, reads originating from host contamination were filtered out (by aligning all reads to P. falciparum 3D7 reference genome) prior to analysis, hence the relatively low average genome coverage depth for these samples. Evaluation of coverage depth and evenness was based on cumulative distributions over a normalized overall average depth. A measurement of low-coverage index lci (d) is defined as the integration of the cumulative distribution C(x) from 0 to d giving an overall assessment of the coverage at the low end of distribution: lci(d)=∫d0C(x)dx The value lci (0.5) that gives a measurement of the coverage below half of the average depth in the distribution was used to compare evenness of coverage for each data set."
Especially I can't understand the method of normalization "... a normalized overall average depth". It just df$norm_depth <- with(df, (depth/median(over_all_avr_depth))) or not???
I'll be grateful for any links, suggestions, and scripts in Python or R.
Thank you for your reply. I have similar doubts about "theoretical" line. Let me explain my pipeline 1. I create coverage column for each position of ref genome
$ bedtools coverage -abam some.bam -b fasta.tab -d > cov.txt
extrat last columncut -f 6 cov.txt > genome_depth
2. In Rtab <- read.table("genome_depth", header = FALSE)
Then I created frequency table of different depths
freq_tab <- as.data.frame(table(tab$V1))
and calculate percent of each length with specific depth value in this table
freq_tab$perc <- with(freq_tab, (100*Freq)/length(tab$V1))
Then I apply simple normalisation by dividing each depth by median depth
dc <- as.numeric(levels(freq_tab$Var1))
andfreq_tap$scale_depth <- with(freq_tab, (dc/median(dc)))
And plot this data
plot(freq_tab$scale_depth, cumsum(freq_tab$perc),type="l")
http://prntscr.com/dfawi2
What do you think is this correct way to obtain such graph? When I add the second sample I get this plot http://prntscr.com/dfb05s
They are similar to what I want, but graphics from different samples in the example picture intersect at a certain point which conveniently display slight differences in the slope.