Best pre and post alignment / variant calling QC tools for many WGS/RNA samples 2021
0
0
Entering edit mode
3.4 years ago
William ★ 5.3k

What are your favorite tools in 2021 for pre and post alignment / variant calling QC tools?

Especially if you are dealing with many large whole genome sequenced samples.

My setup is currently:

FASTQ

BAM:

VCF

Summary QC report over many samples and FASTQ/BAM/VCF reports

This does a reasonable job. But a few things could be improved:

  • MultiQC works well for few to medium set of samples, but not for 500+ samples. Report becomes difficult to interpret, and difficult so select samples with issues and drill down to these samples with e.g. sequencing issues . All you get is bee-swarm plot without the possibility to find out which samples have strange QC values.
  • Samtools stats does not report coverage by it self. You need to calculate it your self by dividing bases mapped by genome size
  • Qualimap makes nice reports, but it's (Java) CPU and memory usage is unreasonable, especially for large genomes and many samples
  • Qualiamp reports are difficult to summarize over many samples in MultiQC
  • BCFTools stats can't output the desired sample stats in 1 pass. 1 pass per sample gives the best stats that can be loaded in MultiQC (taking long with 100GB+ BCF file read many hundreds of times, or first splitting the multi-sample file to single sample BCF)

So I am wondering what other people use, which tools and how do you summarize the QC results over many samples, with stilling being able to drill down to samples with issues.

fastq vcf bam • 1.5k views
ADD COMMENT
2
Entering edit mode

It sounds like your use case is an outlier. Few people have hundreds of samples so the programs you mention above likely work reasonably well for most.

ADD REPLY
1
Entering edit mode

How about:

  • make 10 MultiQC reports with 50 samples each or so if you really feel like manually inspecting them.
  • mosdepth for coverage, I guess nothing beads it in speed and memory usage
ADD REPLY
0
Entering edit mode

It is possible to increase the multiqc sample number limits for interactive plots and tables. By creating a modified ~/.multiqc_config.yaml in your home dir, see example https://github.com/ewels/MultiQC/blob/44f28ef0726bc65fd965aa99d5a19f7745c749c4/test/config_example.yaml The limit for the interactive table was 500, so I was just above it. After increasing the sample limit to 5000 the interactive table works with 500+ samples, but the interactive plots are very slow, and the entire html is very slow.

ADD REPLY

Login before adding your answer.

Traffic: 1696 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6