picardmetrics
Picardmetrics is a Bash script that simplifies calling Picard tools and collates the different output files generated by Picard. It also has functions for generating the two input files required by CollectRnaSeqMetrics.
In order, picardmetrics run will do the following:
- Automatically create a sequence dictionary using your reference sequence.
- Create a new temporary BAM file that you can keep with option
-k
. - Reorder the header of the BAM file to match the reference.
- Sort the reads in the BAM file by coordinate.
- Mark duplicates in the BAM file and report duplicate metrics.
- Run up to 8 additional Picard tools.
After running the tools, use picardmetrics collate to merge all of the generated metrics from multiple BAM files into tab-delimited files. Additionally, all of these tab-delimited files are consolidated into a single file with all metrics from all BAM files and all Picard tools.
Download
Download the latest release here: https://github.com/slowkow/picardmetrics/releases/latest
Example
Use Picard to assess the quality of your sequencing data. This example shows RNA-seq data from hundreds of glioblastoma cells and gliomasphere cell lines.
On the left, each sample is a point, and we see that samples with high mean mapping quality have the greatest number of detected genes. Further, the color of points reveals variation in the percent of reads per sample that are assigned to exons.
On the right, each sample is a bar, broken down into the percent of sequenced bases coming from different genomic regions. We see that many samples have few sequenced bases coming from coding regions relative to intergenic regions.
Thanks Kamil. This will come in handy for newbies.
Looks excellent! Will come in handy for experience users as well.