In NGS, what is Recalibration ? how should I do this ? Why should I care ?
Many thanks
Pierre
In NGS, what is Recalibration ? how should I do this ? Why should I care ?
Many thanks
Pierre
Pierre;
I've been using Broad's GATK recalibration that Michael mentions to recalibrate quality scores after alignment and before SNP calling.
Since scores are based off of machine data (the original score), alignment information (the recalibration) and cycle/tile/sequence information (also recalibration) they should be more stable for SNP finding.
Practically, it's a two step process. First you count the covariate data according to specified criteria:
java -Xmx4g -jar GenomeAnalysisTK.jar \
-T CountCovariates \
-cov ReadGroupCovariate \
-cov QualityScoreCovariate \
-cov CycleCovariate \
-cov DinucCovariate \
-cov TileCovariate \
-recalFile recal_data.csv \
-I aligned.duplicates_marked.bam \
-R /path/to/reference.fasta \
-l INFO -U \
--use_original_quals \
-B dbsnp,PicardDbSNP,/path/to/reference.dbsnp
and then use this to provide recalibrated quality scores:
java -jar GenomeAnalysisTK.jar \
-T TableRecalibration \
-recalFile recal_data.csv \
-R reference.fasta \
-I aligned.duplicates_marked.bam \
-outputBam recal.gatk.bam \
-l INFO -U \
--use_original_quals \
I've written some code to analyze and display the recalibration scores and should have a full blog post on it, but in the meantime here's a plot that shows the original/post-calibration score distribution.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
No idea! But do you mean this: http://www.broadinstitute.org/gsa/wiki/index.php/Base_quality_score_recalibration ?