Using GATK4 Mutect2 on mouse data, need a genome snp reference
0
1
Entering edit mode
6.2 years ago
jc.szamosi ▴ 50

I am using the Mutect2 program from GATK4 to call somatic snps in mouse cancer cells, and I want to use the Sanger Mouse genome project's strain-specific vcf file (ftp://ftp-mouse.sanger.ac.uk/) for the --germline-resource argument, but this argument requires that the vcf file have a POP_AF INFO tag, which the Sanger vcf file does not. Is there a similar germline snp file that I could use that includes this information?

GATK mutect2 mouse SNP vcf • 5.4k views
ADD COMMENT
2
Entering edit mode
ADD REPLY
0
Entering edit mode

The link to the NCBI vcfs in that tutorial is broken, but I'll see if I can find one that works and edit this comment.

Edit: The NCBI vcf files are here: ftp://ftp.ncbi.nih.gov/snp/organisms/archive/mouse_10090/VCF/, but they also don't have POP_AF INFO tags, so they won't work for this purpose. Many thanks, however.

ADD REPLY
1
Entering edit mode

The human population frequencies come from large-scale studies with thousands of samples. I am not sure anything like that exists for other species. There is obviously a lot of mouse sequencing data available, but I don't think there is any organized version.

ADD REPLY
0
Entering edit mode

Thanks! Do you know of a way to make the snp file work with GATK4 if the POP_AF tag is absent?

ADD REPLY
1
Entering edit mode

Technically, that is an optional parameter, so you could skip it.

(iii) Mutect2 also differs from the HaplotypeCaller in that it can apply various prefilters to sites and variants depending on the use of a matched normal (--normal-sample), a panel of normals (PoN; --panel-of-normals) and/or a common population variant resource containing allele-specific frequencies (--germline-resource). If provided, Mutect2 uses the PoN to filter sites and the germline resource and matched normal to filter alleles.

If you cannot find the populations frequencies, then there is not much you can do.

ADD REPLY
0
Entering edit mode

I can skip it, and that's what I've done for now. For technical reasons, my tumor and normal samples need to come from different individuals. I've created a PON from all my normal individuals, but I was hoping for a strain vcf so that I can try to distinguish between among-individual variation in the germline, and actual somatic mutations.

I suppose I could add a fake POP_AF tag to the strain vcf.... I'll have to read more about how that tag is used first, though.

ADD REPLY
0
Entering edit mode

if your goal is to discard germline variants, I would suggest to annotate the Mutect2 output for dbsnp (or file of your interest) using a tool like snpSift and then filter them out.

However, you can always go back to GATK3 in which parameter dbsnp is active

ADD REPLY
0
Entering edit mode

I'm using GATK4to call mouse tumor variants,too. For the same reason, I skip the germline resource. But I think this may be what we needenter link description here. You can find this in README:

If available for this species, the file includes information on: - ancestral_allele - evidence - clinical_significance - global minor allele, frequency and count

Besides, I would like to know which file you used for the -v in GetPileupSummaries. It seems that GetPileupSummaries also needs a vcf file to have MAF tag.

ADD REPLY
0
Entering edit mode

Hi, i am facing the same problem you have too as i am analzsing tumor samples for b6 mice and don't have matched normals for all my samples. Could you find a germline resource or PON compatible with b6 mice? Also as i understood the --dpsnp option is useless in GATK4 so i couldn't find a way so far to make use of the sanger 137snp.vcf file. Did you find a way around it or what did you do for your samples in the end?

ADD REPLY

Login before adding your answer.

Traffic: 2342 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6