Questions On Specifying And Merging Scatter/Gather In Gatk Parallelism
1
0
Entering edit mode
12.3 years ago
C Shao ▴ 140

Here to bother you again :-)

I currently use GATK a lot to analyze sequencing data, but many steps took really long time.

In the wiki on GATK parallelism, they recommend to use scatter/gather to speed up. However, I dont fully understand how to do it.

First, how to merge the results of scatter/gather ? For example, codes from GATK wiki:

gsa1> java -jar GenomeAnalysisTK -R human.fasta -T UnifiedGenotyper -I my.bam -L chr1:1-125,000,000 -o my.1.vcf &
gsa1> java -jar GenomeAnalysisTK -R human.fasta -T UnifiedGenotyper -I my.bam -L chr1:125,000,001-249,250,621 -o my.2.vcf &

and wiki posted: "When these two jobs finish, I just merge the two VCFs together and I've got a complete data set in half the time".

But there are headers in VCF files, how to automatically merge these VCF files? Same problems for BAM file, but I found "MergeSamFiles" in Picard, is it a solution for merging bam files? Will it handle different header files in BAM files?

Second, to specify multiple chro, should I use

-L chr1 chr2 chr3

or

-L chr1
-L chr2

Thanks?

gatk • 4.0k views
ADD COMMENT
0
Entering edit mode

What kind of computational resources do you have access to? If you have a cluster which is compatible with it you might want to check out using GATK Queue (http://www.broadinstitute.org/gatk/guide/topic?name=tutorials), which can handle scatter/gatter automatically.

ADD REPLY
0
Entering edit mode
12.3 years ago

the two VCFs can be merged using VCFtools: http://vcftools.sourceforge.net/docs.html#merge

I you want to handle several regions, as far as I know the GATK use this kind of list : http://www.broadinstitute.org/gsa/wiki/index.php/Input_files_for_the_GATK#Intervals

ADD COMMENT

Login before adding your answer.

Traffic: 1850 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6