Is there a software tool that reports a measure of the degree of non-uniformity in depth of Illumina sequencing coverage across a de novo assembled genome (against which the Illumina reads are mapped back)?
I have the PE read library (2150bp HiSeq4000), the *de novo assembled genome, and the BAM file for mapping of former to the latter - and I have 290 such data points. I am curious to know how many of these 290 have more versus less uniform coverage depth across their respective genomes.
To reiterate: Is there a software (like a supplement to something like BBTool's bbnorm) that can help visualize quickly which of my genome assemblies are built on the basis of more uniform coverage depth?
One measure of non-uniformity of coverage is the fold-80 penalty, (see https://genomebiology.biomedcentral.com/articles/10.1186/gb-2011-12-1-r1). Essentially it is the degree of additional coverage (in fold coverage of the genome) required so that 80% of the target bases will be covered at the current mean coverage.
The rtg coverage command from RTG Core computes the fold-80 penalty, in addition to other statistics and graphs that can be used to visualize coverage distribution information.
From the histogram you can visualize the uniformity of the coverage. stats.txt will contain the average coverage and standard deviation on a per-scaffold basis. The program will also print to the screen the overall average coverage and standard deviation.
You've got
flag twice, so could you have meant
out=covstats.txt
?Fixed, thanks :) It actually doesn't matter (the second stats= overrides the first one). For pileup.sh, covstats, stats, and out are synonymous...