What tools are people using for QC'ing the initial stages of NGS data analysis?
I'm currently using FastQC and picard - but have been thinking of ways to improve on this (which led to the feature list below). I would appreciate tool recommendations and/or comments on features you like or don't like or feel are missing in existing tools.
Thanks, -Ben
Ps. List of desired features:
- FastQC-style
- user-friendly web interface
- very easy to download, install, run
- Picard-style
- has stats for both single and paired-end data, and for both unaligned (eg. %GC content) and aligned (eg. insert size, coverage)
- outputs data tables that are easy to read into R, etc.
- Extensibility
- would be nice to have a simple way to add custom graphs and stats without having to recompile. Though Picard and FastQC are open source and modular, they are written in Java which adds complexity and reduces the potential developer base relative to a language like python. Perhaps a better options could be a toolkit that consists of language-agnostic sub-executables that, for example, write out image and data-table files to disk and pass some message to the parent process that allows it to combine all the data pieces into a coherent webpage.
- Other
- would be good if the tool gave suggestions on typical causes for problems like excessive duplication or kmer spikes, instead of just reporting graphs/stats/warnings.
- the tool could simultaneously take .fastq and one or more processed bam files from a single sample, and then show both unaliged and aligned stats in one interface, as well as how the stats change as a result of the different processing steps in the NGS pipeline.