So, I am reading up on the documentation for GATK tool called RealignerTargetCreator...and right off the bat, I see that you can use the parameter called -known. The document says that this parameter is used to specify the VCF file with known indels. So is this things like the VCF files we download from dbSNP or 1000 genomes website? Seems like very obvious choice, but I just want to make sure that I am on the track.
Thank you in advance,
Young
Yes. Basically they are asking you to provide the list of known indels in your specie of interest. GATK realigner takes the BAM file and tries to realign reads at those positions. I dont work with human but I think both of your options dbSNP or 1000 genomes indels should work for you. I work with mouse and normally use Indel data from 17 mics strains that have been sequenced as a part of MGP. In short, the more indels the better.