Hello folks,
I followed this pipeline to process some single read NGS data:
Since we used the Agilent SureSelect Kit we have a .bed-file which comprises the regions of interest. Now I thought about reducing the effort by using this file. Is it useful to skip the 6th step and insert the .bed-file instead?
java -jar /bin/GTK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /seq/REFERENCE/human_18.fasta -I /output/FOO.sorted.bam -o /output/FOO.intervals
Are there any further improvements possible with this file?
Any help appreciated, Oliver
Thank you for your answer, Brad. Can you tell me more about step 6? This is not related to my inital question anymore but I want to understand why step 6 is exactly happening. How does he determine misalignments? (My thought is, when he has to find misaligments why didn't he align it right in the first step?)
Oliver, glad that it helped. The realignment step has the advantage of looking at multiple reads aligned to the same position, while the initial alignment only considers one read at a time. As a result the realignment can identify regions where indels can be adjusted to avoid mismatches. The GATK wiki has a good description of the approach: http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels