Entering edit mode
20 months ago
rj.rezwan
▴
10
Hi, I have 64 different accessions files having *g.vcf
output files after haplotyepcalling in GATK. Now I want to combine them in one file, and I am using CombineGVCFs
but unable to get the output and file showing error. Can someone suggest how to combine them in the best smart way or is there any issue with combineGCVF for combining many samples files? the code is here
#!/bin/bash
#
#SBATCH --job-name=combine_files
#SBATCH --output=combine_files.%j.out
#SBATCH --partition=batch
#SBATCH --cpus-per-task=20
#SBATCH --time=100:00:00
#SBATCH --mem=600G
module load gatk/4.1.2.0
ref_dir=(~/path/PitayaGenomic.fa)
gvcf_dir=(~/path/*.g.vcf)
gatk CombineGVCFs -R ${ref_dir} $(printf -- '--variant %s ' "${gvcf_dir[@]}") -O joint_files.g.vcf
Without error message, it is hard to say where the problem is. You could try increasing memory (
--java-options "-Xmx50g"
for example). Just as a remark, GATK recommends usingGenomicDBImport
instead ofCombineGVCFs
especially when dealing with large numbers of files (more than 1000). But I don't think it is your case here.You have to comment and /or to validate the answers for all your other questions: is it mandatory to use fixmate? ; does warning has an impact on the markduplication output file? ; why sorted.bam file size is increased after using fixmate in picard ; Apple gene ID conversion tool ; Submission of RNA-seq data in NCBI Sequence Read Archive (SRA) ; Gene ontology and KEGG pathway analysis of rice RNA-seq data ;
sure, I will make a comment ASAP within a few days with some suitable answers which would be useful for new users
Not just a comment. Validate (green mark on the left) the good answers for your previous questions.