I have a query that we do the local realignment around the indels with the known sites because it identifies most parsimonious alignment along all reads at a problematic locus. InDels in reads among the near ends can cause the mappers into mis-aligning with mismatches. These artifactual mismatches can harm base quality recalibration and variant detection.
However if one is not interested in performing this step as one might not be interested in the InDel assessment downstream but if one wants a local realignment of the mapped reads around the exonic regions using the bed files containing the probes that was provided by the company to check the quality of reads that mapped only to the exonic regions. Can this be done? I am particularly interested in trying to see how many of my mapped reads which aligned to the whole hg19 genome is actually getting mapped on the exonic region, then we can assess the coverage of the reads in that way.
I believe we can do this also by using the DepthofCoverage walker and use it on the local indel realigned file along with the target interval bed file which captures the probe for the exonic regions to find out the statistics of the mapping. But if we skip the local realignment around the indels and rather perform alignment around the exonic region with the bed files before variant calling can we do that? Is there any walker in the GATK 2.3 which can perform this task? Or are there any other tool which can be used to perform this step just aligning the mapped reads to the exonic region and then do a quality check i mean the read counts that pertain to only the exonic region as this will give me more robust variant calling downstream and I will not have to be worried at the end of losing out a lot of reads that did not get excluded owing to regions of introns in the reference genome. If anyone has perform such kind of analysis please advice and what tool has been used to do this? be it the SAMTOOLs or the GATK.
Since am using the GATK , I would prefer some walker in the GATK that can perform that task, please suggest.
Thanks
If you don't want local realignment then don't do it. Many variant callers (most?) don't do realignment at all (samtools for example), plus that process only makes a difference for indels anyhow. If you want to map against exome, then use that as your target database, his makes a difference only if you are using a prepackaged pipeline software that strictly requires certain naming/files to be present.
But the target ref gene for hg19 is the whole genome. Is it possible to retrieve only exonic regions of the hg19 somehow and then align the reads with bwa/bowtie to that but sadly am using GATK which will restrict my downstream analysis as it will not work for that kind of target exomes, it only works for the genome wide. It would be highly appreciable if you have performed such analysis or not. let me know. Many thanks for the reply
that is a different question, I would post that separately, there are many ways to extract the sequences for genes. Retrieve the sequences from Ensemble, use a masked the genome, extract sequences with a command line tool. It all depends on what you are familiar with.