Picard CollectWgsMetrics for each chromosome? Standard deviation of coverage per chromosome
3
3
Entering edit mode
10.2 years ago

Hi I'm running Picard on some BAM files and noticed that the default output for CollectWgsMetrics is for the whole genome. I would like to find the standard deviation of the coverage for each chromosome.

Does anyone know how to make this possible? Even using a different program. I have the output from bedtools genomeCov but I do not know how to infer the standard deviation of read depth for each chromosome.

standarddeviation command picard • 6.4k views
ADD COMMENT
0
Entering edit mode

This would be a really nice feature for all the picard Collect* commands. In our case, looking at the whole genome together isn't appropriate since some of the contigs are repeated elements (and some may well be contamination or other cruft). A few select contigs would provide a better characterization. Also, the current "all the 'genome'" approach doesn't make sense if you are competitively mapping to multiple genomes at once (doing WGS metagenomics for example).

IMO, the ideal implementation would be a command-line argument (lets call it CHROMS) where you give a list of chroms/contigs to process together as a single unit. The default (null) would process all the chroms/contigs.

To generate stats for each chrom/contig, you could either run picard multiple times, specifying a single chrom/contig each time. Probably not ideal, but it would be trivial to implement. Alternatively, if the framework allows it, the CHROMS argument could be given multiple times, and the stats for each one would be output separately (and perhaps overall stats too).

Anyway, let me see if I can find the right place to send a feature request to. You might want to do the same.

ADD REPLY
0
Entering edit mode

Apologies... I cannot seem to get this to actually work myself. In theory it should work, but there is nastiness with the reference and bam header which I just can't seem to sort out.

You can stream individual regions to CollectWgsMetrics (thanks to Nils Homer for reminding me of that).

samtools view -b <in.bam> <region> | java -jar CollectWgsMetrics.jar I=/dev/stdin ...

You could just do that for each chromosome.

ADD REPLY
1
Entering edit mode
10.1 years ago
travcollier ▴ 210

I added INTERVAL_RANGE (taking an interval file) and INTERVAL_RANGE_STRING (taking a whitespace separated list of samtools-like chrom[:start-[end]] regions) to CollectWgsMetrics. Hopefully these changes will get merged into the picard distribution soon.

In the meantime, the source is on github: https://github.com/travc/picard I've also uploaded CollectWgsMetrics.jar: https://github.com/travc/gittemp

To generate per-chromosome output using my version of CollectWgsMetrics.jar (assuming you have the command parallel avaialbe):

ref.fa is your indexed reference sequence (index file is ref.fa.fai)

in.bam is your bam file... indexed if you want the commands to be fast

parallel --gnu --colsep '\t' -a ref.fa.fai java -Xmx4g -jar CollectWgsMetrics.jar INPUT=in.bam REFERENCE_SEQUENCE=ref.fa OUTPUT="WGS_{1}.out" INTERVAL_LIST_STRING="{1}"

You might need to change -Xmx4g if more or less memory is needed.

ADD COMMENT
0
Entering edit mode

That would be a very useful addition to the tool.

ADD REPLY
0
Entering edit mode
10.2 years ago
Dan D 7.4k

I would split the BAM by chromosome:

bamtools split -in [bamfile] -reference

And then run the Picard CollectWGSMetrics tool on each of the BAM files. The output of the tool is highly regular so it would be easy to then combine the results into a single text file afterwards.

ADD COMMENT
0
Entering edit mode
10.2 years ago

I would use GATK DepthOfCoverage with a BED file containing one chromosome per line: chrom/0/leng(chrom)

ADD COMMENT
0
Entering edit mode

The last time I used DepthOfCoverage, it was really slow in comparison to Picard's complement. Also, I don't think it does the extensive filtering and reporting for duplicates, pass-filter reads, etc that CollectWgsMetrics performs. it's been a while since I've used the GATK tool though--perhaps it's been updated.

ADD REPLY

Login before adding your answer.

Traffic: 1920 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6