Hello,
I have 11 samples from a custom Truseq design
I used HaplotypeCaller for my 11 samples.
java \
-jar /home/horus/Instaladores/GenomeAnalysisTK-3.4-0/GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R /home/horus/Escritorio/GATK/GATK/2.8/b37/human_g1k_v37.fasta \
-ERC BP_RESOLUTION \
-I $file/alineamiento/recal_reads.bam \
-L ../../../../data/bed/disenio_ENP_illumina2_ordenado.bed \
--genotyping_mode DISCOVERY \
-stand_emit_conf 10 \
-stand_call_conf 30 \
-o $file/alineamiento/raw_variants_bed.gvcf \
-variant_index_type LINEAR \
-variant_index_parameter 128000
and then do joint genotyping on my 11 samples
java \
-jar /home/horus/Instaladores/GenomeAnalysisTK-3.4-0/GenomeAnalysisTK.jar \
-T GenotypeGVCFs \
-R /home/horus/Escritorio/GATK/GATK/2.8/b37/human_g1k_v37.fasta \
-D /home/horus/Escritorio/GATK/GATK/2.8/b37/dbsnp_138.b37.vcf \
--variant ../Analisis/SAR093-2015/alineamiento/raw_variants_bed.gvcf \
--variant ../Analisis/SAR094-2015/alineamiento/raw_variants_bed.gvcf \
--variant ../Analisis/SAR095-2015/alineamiento/raw_variants_bed.gvcf \
--variant ../Analisis/SAR096-2015/alineamiento/raw_variants_bed.gvcf \
--variant ../Analisis/SAR097-2015/alineamiento/raw_variants_bed.gvcf \
--variant ../Analisis/SAR098-2015/alineamiento/raw_variants_bed.gvcf \
--variant ../Analisis/SAR099-2015/alineamiento/raw_variants_bed.gvcf \
--variant ../Analisis/SAR100-2015/alineamiento/raw_variants_bed.gvcf \
--variant ../Analisis/SAR101-2015/alineamiento/raw_variants_bed.gvcf \
--variant ../Analisis/SAR102-2015/alineamiento/raw_variants_bed.gvcf \
--variant ../Analisis/SAR103-2015/alineamiento/raw_variants_bed.gvcf \
-o output.vcf
After the joint genotyping I have the file output.vcf a multisample VCF, I need to filter the variants (hard filtering), and I don't know which is the proper approach to do this:
- Just apply hard filters to this multi sample VCF
- Split the multisample vcf by sample, and apply filters to each individual VCF (which I think is weird, since after the split the INFO column in every VCF is the same)
or
- Apply hard filters to each VCF, and then do the joing genotyping step?
Thanks
Hello @nchuang, I'm following the GATK best practices. To be able to use the VQSR you need a lot of samples, more than 30 exomes. In my case I just have 11 samples, from targeted sequencing, so I must stick to the hard filters.