I have some mouse C57 targeted panel sequencing data. I want to call somatic variants using GATK. Because of the cost, there are only 4 normal-tumour matched samples. and the rest 16 tumour samples have no matching normal.
Q1) Should I be using the latest assembly GRCm39 as the reference for bwa or GRCm38? My concern is the files needed in later steps might not be available for GRCm39. e.g. dbSNP availableon GRCm38 but cannot find any on GRCm39.
Q2) Should I process the tumour with normal samples differently in Mutect2
? i.e. tumour with matched normal mode
for the 4 T-N matched samples and tumour only mode
for the remaining 16 samples?
Q3) I have trouble finding these two files:
1) --known-sites sites_of_variation.vcf \
for BaseRecalibrator
2)--germline-resource af-only-gnomad.vcf.gz \
for Mutect2
and PON
I found 2 links for --known-sites sites_of_variation.vcf \
.
ftp://ftp.ncbi.nih.gov/snp/organisms/archive/mouse_10090/VCF/ and
ftp://ftp-mouse.sanger.ac.uk/REL-1303-SNPs_Indels-GRCm38/
Do I need to prepare the files as per: genomics/gatk-mouse-mm10.md at master · igordot/genomics · GitHub. It is taking hours to download one file and NCBI connection keeps dropping...
I have also found the following vcf files. Are the two below suitable to use as --known-sites sites_of_variation.vcf \
? Whats the difference? and which one should be used?
Sanger REL-1505 mouse strain specific vcf:
Should I be using both C57...SNPs.vcf
and C57...indels.vcf
as input for --known-sites sites_of_variation.vcf \
?
Lastly, I cannot find anything on the Mutect2 required --germline-resource af-only-gnomad.vcf.gz \
Could you help please?
Sorry for the million questions and thank you in advance!