GATK4.3 ApplyBQSR
3
0
Entering edit mode
21 months ago
wonde2000 • 0

Dear All, I hope everyone is doing well. When I run ApplyBQSR some of my samples generate all the chromosomes with MT as indicated below while the other samples came from the same breed with a similar pipeline and missed MT. I am wondering about the output to proceed to the next step. Thank you in advance for your valuable suggestions!

Here is the code I used

gatk ApplyBQSR -R ../reference/Bos_taurus.ARS-UCD1.2.dna.toplevel.fa -I ../data/mapped/ABigar/Abi1_dedup.bam --bqsr-recal-file ../data/mapped/ABigar/AB_haplo/Abi1_recal.table1 -O ../data/mapped/ABigar/AB_haplo/Abi1_recal.bam

the last portion of the result is posted below.

22:40:14.206 INFO  ProgressMeter -          X:126181720            117.3             211182000        1800853.9
22:40:24.231 INFO  ProgressMeter -          X:128930737            117.4             211409000        1800224.7
22:40:34.253 INFO  ProgressMeter -          X:132515206            117.6             211701000        1800150.8
22:40:44.264 INFO  ProgressMeter -          X:136218098            117.8             211997000        1800113.8
22:40:54.265 INFO  ProgressMeter -              MT:2801            117.9             212241000        1799638.8
22:41:04.266 INFO  ProgressMeter - NKLS02002208.1:4545561            118.1             212535000        1799588.0
22:41:14.268 INFO  ProgressMeter - NKLS02000910.1:966969            118.3             212839000        1799621.9
22:41:24.278 INFO  ProgressMeter - NKLS02002210.1:706467            118.4             213084000        1799155.7
23:01:39.103 INFO  ProgressMeter -  NKLS02001671.1:4679            138.7             250021000        1802828.3
23:01:49.105 INFO  ProgressMeter -  NKLS02001453.1:1923            138.8             250290000        1802601.2
23:01:59.121 INFO  ProgressMeter -             unmapped            139.0             250614000        1802767.2
23:02:09.153 INFO  ProgressMeter -             unmapped            139.2             250946000        1802986.9
23:02:19.179 INFO  ProgressMeter -             unmapped            139.4             251233000        1802884.5
23:02:29.200 INFO  ProgressMeter -             unmapped            139.5             251575000        1803177.5
23:02:39.225 INFO  ProgressMeter -             unmapped            139.7             251914000        1803447.6
23:02:49.245 INFO  ProgressMeter -             unmapped            139.9             252181000        1803203.2
23:02:49.837 WARN  IntelInflater - Zero Bytes Written : 0
23:02:49.840 INFO  ApplyBQSR - 0 read(s) filtered by: WellformedReadFilter

23:02:49.840 INFO  ProgressMeter -             unmapped            139.9             252199945        1803210.8
23:02:49.840 INFO  ProgressMeter - Traversal complete. Processed 252199945 total reads in 139.9 minutes.
23:02:50.929 INFO  ApplyBQSR - Shutting down engine
[22 February 2023 at 23:02:50 CET] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 139.90 minutes.
Runtime.totalMemory()=2579496960
ApplyBQSR • 2.0k views
ADD COMMENT
0
Entering edit mode

Very naive question but are you sure you have reads on MT in all your alignment files?

ADD REPLY
0
Entering edit mode

Yes of course Rapha! I confirmed it is ok. Now, the question is how to extract only MT and call SNP and indels of MT for diversity study?

ADD REPLY
0
Entering edit mode
21 months ago
LChart 4.6k

What have you done to check that some of your samples are in fact missing reads on MT? This would seem a little bit strange. For such a sample, what's the result of samtools view -L MT $bam? I suspect that you're assuming if MT is missing from an ApplyBQSR log file then it wasn't processed -- it only means that it was processed quickly enough that the traversal moved past MT in the time/reads between logging.

ADD COMMENT
0
Entering edit mode

Dear LChart, exactly it pases quickly when generating the log file. I check the recalibrated bam file using Samtools view Bo4_sorted.bam MT | less –S the MT regions are available. Instead of trying samtools view -L MT $bam, I am planning to run Haplotypecaller with -L MT and follow GATK procedures to the end and that may generate SNP and indels for only specified region (MT). Any suggestion? Thanks!

ADD REPLY
0
Entering edit mode

My suggestion is to follow the Mitochondrial workflow here: https://github.com/gatk-workflows/gatk4-mitochondria-pipeline

Notably:

  • The mitochondria are circular chromosomes, so you have to deal with the improper breakpoints in the reference; this is typically done using an 'extended' linear chromosome. The above workflow handles this.
  • Mitochondria reflect a pool of many lineages, so forcing haploidy or k-ploidy may not be appropriate. It's closer to pooled or tumor sequencing; which is why the --mitochondria mode exists in MuTect; so MuTect is preferable to HaplotypeCaller
ADD REPLY
0
Entering edit mode
21 months ago
wonde2000 • 0

Dear LChart, Thank you so much for your prompt and remarkable response. As I mentioned I am quite new for GATK and when I read Mutect2 I found the following explanation and command.

(iii) Mitochondrial mode Mutect2 automatically sets parameters appropriately for calling on mitochondria with the --mitochondria flag. Specifically, the mode sets --initial-tumor-lod to 0, --tumor-lod-to-emit to 0, --af-of-alleles-not-in-resource to 4e-3, and the advanced parameter --pruning-lod-threshold to -4.

gatk Mutect2 \ -R reference.fa \ -L chrM \ --mitochondria \ -I mitochondria.bam \ -O mitochondria.vcf.gz

My question here is

  1. Am I supposed to first extract MT sequence from recalibrated WGS bam file for all samples in a separate bam file?
  2. I think the command is for a single sample, so how can I combined all samples after running the above command for individual samples?
  3. Is the above command is enough to call final MT variants for downstream analysis? or
  4. After generating mitochondria.vcf.gz for each samples, CombineGVCFs and GenotypeGVCFs are mandatory or any other step is there in Mutect2?

Finally, I have also read a pipeline used to call MT using Haplotypecaller https://www.frontiersin.org/articles/10.3389/fgene.2022.692257/full#supplementary-material

Thank you once again for your extraordinary effort!

ADD COMMENT
0
Entering edit mode

Take a look through the pipeline I linked, which will provide insights to all 4 of your questions.

ADD REPLY
0
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 2644 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6