I am making a custom python script that finds average coverage at each position surrounding TSS. The script takes in a bedgraph file and a gtf file.
My question is: do you think ignoring regions of zero coverage will adversely affect my TSS plot? For example:
distance from TSS | coverage
-------------------------|--------------
-3000 | 0
-3000 | 2.3
-3000 | 0
-2999 | 0
-2999 | 0
-2888 | 3.1
-2888 | 2.9
-2888 | 2.1
-2888 | 0
It may seem like a strange question, but I need to change my workflow in order to consider the values of zero coverage. So would the overall trend of the metaplot be different if I ignore the regions of zero?
Thanks
It is just the average for each position, so for position -3000 from TSS, my average would have been (0 + 2.3 + 0) / 3. Each row represents a different TSS (different gene). The coverage values are normalized per million. Sorry I realize it isn't totally clear but I did not want to go into the details of my python script.
I have been using an R package "ChiPseeker" which will give me a TSS plot but it does too much under the hood. I am doing my own script to have more control over the data. I'm going to redesign my workflow so I can use the zeroes.
Sorry for the weird question, I was just hoping to avoid including the zeroes.
If you already decided to exclude zeroes, what is your question about?
deepTools' computeMatrix lets you tune numerous parameters, including the handling of zeros (
--skipZeros
) and the type of calculation (mean, median, ...,--averageTypeBins
). plus it's fairly fast and optimized, extensively tested and widely used.It's not that I decided to exclude them. My question was about the effect of including or excluding zeroes. I have the zeroes now and the reason they were missing before was because I was using a tool which was converting my bam files into coverage files in a way that skipped regions of zero coverage.
I'll admit I was just being a bit lazy with not wanting to go back and change my workflow. Thanks for telling me about computeMatrix! I have used deepTools but not tried the computeMatrix function.