Usage of GATK Joint genotyping and filtering, or filtering and then joint genotyping?
1
0
Entering edit mode
9.1 years ago

Hello,

I have 11 samples from a custom Truseq design

I used HaplotypeCaller for my 11 samples.

java \
  -jar /home/horus/Instaladores/GenomeAnalysisTK-3.4-0/GenomeAnalysisTK.jar \
  -T HaplotypeCaller \
  -R /home/horus/Escritorio/GATK/GATK/2.8/b37/human_g1k_v37.fasta \
  -ERC BP_RESOLUTION \
  -I $file/alineamiento/recal_reads.bam \
  -L ../../../../data/bed/disenio_ENP_illumina2_ordenado.bed \
  --genotyping_mode DISCOVERY \
  -stand_emit_conf 10 \
  -stand_call_conf 30 \
  -o $file/alineamiento/raw_variants_bed.gvcf \
  -variant_index_type LINEAR \
  -variant_index_parameter 128000

and then do joint genotyping on my 11 samples

java \
  -jar /home/horus/Instaladores/GenomeAnalysisTK-3.4-0/GenomeAnalysisTK.jar \
  -T GenotypeGVCFs \
  -R /home/horus/Escritorio/GATK/GATK/2.8/b37/human_g1k_v37.fasta \
  -D /home/horus/Escritorio/GATK/GATK/2.8/b37/dbsnp_138.b37.vcf \
  --variant ../Analisis/SAR093-2015/alineamiento/raw_variants_bed.gvcf \
  --variant ../Analisis/SAR094-2015/alineamiento/raw_variants_bed.gvcf \
  --variant ../Analisis/SAR095-2015/alineamiento/raw_variants_bed.gvcf \
  --variant ../Analisis/SAR096-2015/alineamiento/raw_variants_bed.gvcf \
  --variant ../Analisis/SAR097-2015/alineamiento/raw_variants_bed.gvcf \
  --variant ../Analisis/SAR098-2015/alineamiento/raw_variants_bed.gvcf \
  --variant ../Analisis/SAR099-2015/alineamiento/raw_variants_bed.gvcf \
  --variant ../Analisis/SAR100-2015/alineamiento/raw_variants_bed.gvcf \
  --variant ../Analisis/SAR101-2015/alineamiento/raw_variants_bed.gvcf \
  --variant ../Analisis/SAR102-2015/alineamiento/raw_variants_bed.gvcf \
  --variant ../Analisis/SAR103-2015/alineamiento/raw_variants_bed.gvcf \
  -o output.vcf

After the joint genotyping I have the file output.vcf a multisample VCF, I need to filter the variants (hard filtering), and I don't know which is the proper approach to do this:

  • Just apply hard filters to this multi sample VCF
  • Split the multisample vcf by sample, and apply filters to each individual VCF (which I think is weird, since after the split the INFO column in every VCF is the same)

or

  • Apply hard filters to each VCF, and then do the joing genotyping step?

Thanks

variants GATK joint-genotyping filtering • 4.8k views
ADD COMMENT
0
Entering edit mode
9.1 years ago
nchuang ▴ 260

maybe I'm a bit confused but are you following the best practices for GATK?

I haven't checked it in a bit but I believe after joint genotyping you have to do VQSR then do refinement steps and then you would do the filtering. I think the concept is to only filter after you are done processing the data. Is this human data? Why do you have to do hard filter?

you do not need to split by sample to filter in any case (where are you getting this?).

ADD COMMENT
0
Entering edit mode

Hello @nchuang, I'm following the GATK best practices. To be able to use the VQSR you need a lot of samples, more than 30 exomes. In my case I just have 11 samples, from targeted sequencing, so I must stick to the hard filters.

ADD REPLY

Login before adding your answer.

Traffic: 2197 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6