I want to do some SNP calls from exome sequencing data and I found really good documentation in seqanswer. I have one question about the step for quality score recalibration in GATK (http://seqanswers.com/wiki/How-to/exome_analysis#Quality_score_recalibration) which use dbSNP. From the tutorial, it shows the data from UCSC. Currently, I use ensemble GRCh38 for my genome reference so I have question about that. Can I use UCSC dbSNP for my aligned to Ensemble data? I also check Ensemble FTP and found this link and this one. So, which one I should use because the tutorial use txt file from UCSC (I checked the UCSC the filetype still txt). Thank you for your answer.
You should use this file ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b142_GRCh38/VCF/All.vcf.gz for your purpose if you have aligned reads on GRCh38 version of the genome. But im not sure that you use the right one guide for the score recalibration. According to this post http://gatkforums.broadinstitute.org/discussion/1248/countcovariates, countcovariates tools is no longer supported by GATK. For the base quality recalibration it is recommended to use BaseRecalibrator and after snp calling for exome data (or WGS) it is recommended to perform automatic variant quality recalibration with VariantRecalibrator (not with VariantFiltration as said on seqanswers). You can find documentation directly on GATK site, which is really good.
Thank you. I'm using it right now but I want to ask about something. The description of BaseRecalibartor is like this: This tool is designed to work as the first pass in a two-pass processing step. So, what is the second pass? I can not find the second step of this and I checked the CountCovariates and tableRecalibrator from the Seqanswer tutorial is no longer exist.
Thank you. I'm using it right now but I want to ask about something. The description of BaseRecalibartor is like this: This tool is designed to work as the first pass in a two-pass processing step. So, what is the second pass? I can not find the second step of this and I checked the CountCovariates and tableRecalibrator from the Seqanswer tutorial is no longer exist.
This may be useful https://www.broadinstitute.org/gatk/guide/article?id=44, https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_readutils_PrintReads.php. Firstly you use BaseRecalibrator, which generates .grp table and then you use PrintReads with
-BQSR
argument.