Gatk Realignertargetcreator
1
2
Entering edit mode
10.7 years ago
youngcsong ▴ 100

So, I am reading up on the documentation for GATK tool called RealignerTargetCreator...and right off the bat, I see that you can use the parameter called -known. The document says that this parameter is used to specify the VCF file with known indels. So is this things like the VCF files we download from dbSNP or 1000 genomes website? Seems like very obvious choice, but I just want to make sure that I am on the track.

Thank you in advance,

Young

gatk vcf • 8.8k views
ADD COMMENT
1
Entering edit mode

Yes. Basically they are asking you to provide the list of known indels in your specie of interest. GATK realigner takes the BAM file and tries to realign reads at those positions. I dont work with human but I think both of your options dbSNP or 1000 genomes indels should work for you. I work with mouse and normally use Indel data from 17 mics strains that have been sequenced as a part of MGP. In short, the more indels the better.

ADD REPLY
3
Entering edit mode
10.7 years ago

The GATK comes with a 'resource-bundle': http://gatkforums.broadinstitute.org/discussion/1213/what-s-in-the-resource-bundle-and-how-can-i-get-it

(...)

The current best set of known indels to be used for local realignment

(note that we don't use dbSNP for this anymore); use both files:

  • 1000G_phase1.indels.b37.vcf (currently from the 1000 Genomes Phase I indel calls)
  • Mills_and_1000G_gold_standard.indels.b37.sites.vcf

(...)

ADD COMMENT
0
Entering edit mode

Hi Pierre, thanks for your reply. So the document link you provided does describe everything in detail, and I managed to grab both files from their ftp site. Although, I am bit confused what they mean by "use both files." Does that mean that I can specify both of these files under '-known' parameter? Seems like we can add list of things to this parameter according to the document: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_indels_RealignerTargetCreator.html

ADD REPLY
0
Entering edit mode

Let me take the liberty to answer it on behalf of Pierre. Yes, you can provide one or more VCF files under '-known' parameter.

ADD REPLY
0
Entering edit mode

Hi Ashutosh, thank you for the reply. One more question before I embark on my journey...do I separate the paths of these files by space or a comma...or is this something, where I have to insert the paths on separate file and use that as the input instead?

ADD REPLY
0
Entering edit mode

Frankly speaking I don't remember as it has been a while since I used more than one file. I think you need to put comma in between. GATK will output the exact command it will use and you can check it whether it is only reading one file or both the files. Just try different options, I think it shopuld be easy to figure out.

ADD REPLY
0
Entering edit mode

Okay, so I figured it out...turned out that I need to use the -known parameter twice, once for each VCF file...so the example command would be java -Xmx2g -jar GenomeAnalysisTK-2.8-1/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference_fasta.fa -I input_bam.bam -o output_file.intervals -known known_indel_1.vcf -known known_indel_2.vcf

User must specify the FULL PATH of reference fasta, and known indel VCF files for this to work.

ADD REPLY
0
Entering edit mode

Great. Glad that you could figure it out. Thanks for posting the solution.

ADD REPLY

Login before adding your answer.

Traffic: 1303 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6