What is the recommended way of merging the resulting vcf files when running GATK UnifiedGenotyper version 1.6 on chunks of, say, 10M?
I've got files like this:
chr10.0001.vcf
chr10.0002.vcf
chr10.0003.vcf
chr10.0004.vcf
chr10.0005.vcf
chr10.0006.vcf
chr10.0007.vcf
chr10.0008.vcf
chr10.0009.vcf
chr10.0010.vcf
chr10.0011.vcf
chr10.0012.vcf
chr10.0013.vcf
chr10.0014.vcf
where the first file is the first 10M, the second is the following 10M, etc. and I want to end up with a since chr10.vcf file that includes all the ones above.
Is just doing find -name "chr10.*" | sort | xargs cat
on the files enough?
GATK also offers an option to do a smart concatenation of variants that is faster than CombineVariants but safer than regular cat. See http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_tools_CatVariants.html