Entering edit mode
6.2 years ago
jc.szamosi
▴
50
I am using the Mutect2 program from GATK4 to call somatic snps in mouse cancer cells, and I want to use the Sanger Mouse genome project's strain-specific vcf file (ftp://ftp-mouse.sanger.ac.uk/) for the --germline-resource
argument, but this argument requires that the vcf file have a POP_AF
INFO tag, which the Sanger vcf file does not. Is there a similar germline snp file that I could use that includes this information?
Maybe this is of assistance:
https://github.com/igordot/genomics/blob/master/workflows/gatk-mouse-mm10.md
The link to the NCBI vcfs in that tutorial is broken, but I'll see if I can find one that works and edit this comment.
Edit: The NCBI vcf files are here: ftp://ftp.ncbi.nih.gov/snp/organisms/archive/mouse_10090/VCF/, but they also don't have POP_AF INFO tags, so they won't work for this purpose. Many thanks, however.
The human population frequencies come from large-scale studies with thousands of samples. I am not sure anything like that exists for other species. There is obviously a lot of mouse sequencing data available, but I don't think there is any organized version.
Thanks! Do you know of a way to make the snp file work with GATK4 if the POP_AF tag is absent?
Technically, that is an optional parameter, so you could skip it.
If you cannot find the populations frequencies, then there is not much you can do.
I can skip it, and that's what I've done for now. For technical reasons, my tumor and normal samples need to come from different individuals. I've created a PON from all my normal individuals, but I was hoping for a strain vcf so that I can try to distinguish between among-individual variation in the germline, and actual somatic mutations.
I suppose I could add a fake POP_AF tag to the strain vcf.... I'll have to read more about how that tag is used first, though.
if your goal is to discard germline variants, I would suggest to annotate the Mutect2 output for dbsnp (or file of your interest) using a tool like snpSift and then filter them out.
However, you can always go back to GATK3 in which parameter dbsnp is active
I'm using GATK4to call mouse tumor variants,too. For the same reason, I skip the germline resource. But I think this may be what we needenter link description here. You can find this in README:
Besides, I would like to know which file you used for the -v in GetPileupSummaries. It seems that GetPileupSummaries also needs a vcf file to have MAF tag.
Hi, i am facing the same problem you have too as i am analzsing tumor samples for b6 mice and don't have matched normals for all my samples. Could you find a germline resource or PON compatible with b6 mice? Also as i understood the --dpsnp option is useless in GATK4 so i couldn't find a way so far to make use of the sanger 137snp.vcf file. Did you find a way around it or what did you do for your samples in the end?