Variant calling with GATK on human cancer samples aligned against GRCh38.p10
0
1
Entering edit mode
6.7 years ago
mihai72 ▴ 10

I'm doing a college project where I need to call variants on some human cancer samples. I've chosen to align the reads to the GRCh38.p10 assembly, but now I'm having a hard time finding the appropriate 1000 Genomes Indels VCF files to run BaseRecalibrator and downstream commands.

The latest GATK bundle seems to have vcfs for hg38, but I presume all the chromosome names are in the UCSC format, and not compatible with my GRCh38 assembly.

My question is can I use the Mills gold standard indels for build GRCh37, or does somebody know where I can find the latest ones for the GRCh38 assembly?

Also, the study that my samples are derived from used the Agilent SureSelect Human All Exon v4 to do exome sequencing, and I read on the GATK website that I should use the -L tag with a custom bed file when running BaseRecalibrator for exome sequencing data. Does anyone have the corresponding bed file? I tried going to Agilent's eArray site but it appears to be down for me.

If not, I used Pierre Lindenbaum's command in this post to generate my own bed file, but it didn't work for me since it was based on hg38 and all the chromosomes are named incorrectly. I can fix that with a simple text replacement script, but I was wondering if the hg38 exome coordinates will be the same as the ones for the GRCh38 assembly.

Thanks for your help.

SNP genome next-gen alignment GATK • 2.6k views
ADD COMMENT
1
Entering edit mode

You may consider re-aligning to GRCh37 / hg19 just for convenience. It takes time for resources to update after a new genome build release, even though hg38 has been out for some considerable time.

Your question will also most likely get a better response on the GATK forum itself.

Finally, you may consider a non-GATK somatic variant caller.

ADD REPLY
0
Entering edit mode

Has anyone managed to find a GRCh38 version of 1000G and Hills & 1000G indels?

ADD REPLY
0
Entering edit mode

Direct link to mills hg38: ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz Roman Luštrik

ADD REPLY
0
Entering edit mode

How would one reconcile UCSC/ENSEMBL difference in annotation for this file? E.g. hg38 has chromosomes named "chr1" while GRCh38 has "1". There are probably differences in other contigs as well?

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1886 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6