Question

Questions On Specifying And Merging Scatter/Gather In Gatk Parallelism

0

Entering edit mode

12.7 years ago

C Shao ▴ 140

Here to bother you again :-)

I currently use GATK a lot to analyze sequencing data, but many steps took really long time.

In the wiki on GATK parallelism, they recommend to use scatter/gather to speed up. However, I dont fully understand how to do it.

First, how to merge the results of scatter/gather ? For example, codes from GATK wiki:

gsa1> java -jar GenomeAnalysisTK -R human.fasta -T UnifiedGenotyper -I my.bam -L chr1:1-125,000,000 -o my.1.vcf &
gsa1> java -jar GenomeAnalysisTK -R human.fasta -T UnifiedGenotyper -I my.bam -L chr1:125,000,001-249,250,621 -o my.2.vcf &

and wiki posted: "When these two jobs finish, I just merge the two VCFs together and I've got a complete data set in half the time".

But there are headers in VCF files, how to automatically merge these VCF files? Same problems for BAM file, but I found "MergeSamFiles" in Picard, is it a solution for merging bam files? Will it handle different header files in BAM files?

Second, to specify multiple chro, should I use

-L chr1 chr2 chr3

or

-L chr1
-L chr2

Thanks?

gatk • 4.1k views

ADD COMMENT • link updated 12.7 years ago by Pierre Lindenbaum 166k • written 12.7 years ago by C Shao ▴ 140

0

Entering edit mode

What kind of computational resources do you have access to? If you have a cluster which is compatible with it you might want to check out using GATK Queue (http://www.broadinstitute.org/gatk/guide/topic?name=tutorials), which can handle scatter/gatter automatically.

ADD REPLY • link 12.7 years ago by Johan ▴ 890

score 0 · Answer 1 · 2012-07-27

0

Entering edit mode

12.7 years ago

Pierre Lindenbaum 166k

the two VCFs can be merged using VCFtools: http://vcftools.sourceforge.net/docs.html#merge

I you want to handle several regions, as far as I know the GATK use this kind of list : http://www.broadinstitute.org/gsa/wiki/index.php/Input_files_for_the_GATK#Intervals

ADD COMMENT • link 12.7 years ago by Pierre Lindenbaum 166k