Hi I'm running Picard on some BAM files and noticed that the default output for CollectWgsMetrics is for the whole genome. I would like to find the standard deviation of the coverage for each chromosome.
Does anyone know how to make this possible? Even using a different program. I have the output from bedtools genomeCov but I do not know how to infer the standard deviation of read depth for each chromosome.
This would be a really nice feature for all the picard Collect* commands. In our case, looking at the whole genome together isn't appropriate since some of the contigs are repeated elements (and some may well be contamination or other cruft). A few select contigs would provide a better characterization. Also, the current "all the 'genome'" approach doesn't make sense if you are competitively mapping to multiple genomes at once (doing WGS metagenomics for example).
IMO, the ideal implementation would be a command-line argument (lets call it CHROMS) where you give a list of chroms/contigs to process together as a single unit. The default (null) would process all the chroms/contigs.
To generate stats for each chrom/contig, you could either run picard multiple times, specifying a single chrom/contig each time. Probably not ideal, but it would be trivial to implement. Alternatively, if the framework allows it, the CHROMS argument could be given multiple times, and the stats for each one would be output separately (and perhaps overall stats too).
Anyway, let me see if I can find the right place to send a feature request to. You might want to do the same.
Apologies... I cannot seem to get this to actually work myself. In theory it should work, but there is nastiness with the reference and bam header which I just can't seem to sort out.
You can stream individual regions to CollectWgsMetrics (thanks to Nils Homer for reminding me of that).
You could just do that for each chromosome.