Question

Best Human Genome Reference File For Gatk?

0

Entering edit mode

11.4 years ago

newDNASeqer ▴ 790

On GATK website http://gatkforums.broadinstitute.org/discussion/1204/what-input-files-does-the-gatk-accept and their public FTP server, I found a few difference human genome references in their resource bundle: b37, b36, hg18, and hg19. This made me wonder which one I should use for exome-sequencing data analysis?

In my pipeline, I started using BWA-MEM with the hg19 reference, should I stay consistent for the same reference with GATK? or it seems to me Broad Institute people recommend using b37, as they said hg18, b36, etc were lifted over from b37 - confused here. thanks

gatk reference bwa • 9.7k views

ADD COMMENT • link updated 11.4 years ago by Matt Shirley 10k • written 11.4 years ago by newDNASeqer ▴ 790

0

Entering edit mode

My practical advice would be to perform your BWA-MEM alignment using b37.

ADD REPLY • link updated 5.2 years ago by Ram 44k • written 11.4 years ago by Matt Shirley 10k

score 4 · Answer 1 · 2013-07-10

4

Entering edit mode

11.4 years ago

Matt Shirley 10k

Yes, you do seem a bit confused. I find it's best to take a look at this FAQ from 1000 Genomes.

This GRCh37-derived alignment set includes chromosomal plus unlocalized and unplaced contigs, the rCRS mitochondrial sequence (AC:NC_012920), Human herpesvirus 4 type 1 (AC:NC_007605) and decoy sequence derived from HuRef, Human Bac and Fosmid clones and NA12878.

So, it's derived from GRCh37, just as UCSC hg19 is, but contains a different mitochondrial sequence, a herpesvirus, and some other unplaced contigs and sequences. The most practical difference is that the contigs are named 17 instead of chr17, using human chromosome 17 as an example.

And finally, yes, once you have generated an alignment to a specific reference genome you need to use the same reference genome in all of your downstream analyses. There is no "switching" for any good reason that I can think of. It's like asking if you should use a French dictionary to decode and English text.

ADD COMMENT • link 11.4 years ago by Matt Shirley 10k

1

Entering edit mode

And note the that the GATK contig/chromosome ordering is important for downstream processing, so pick a reference and stick with it throughout.

ADD REPLY • link 11.4 years ago by Sean Davis 27k

1

Entering edit mode

Yes, I had just gone back to make this point in an edit!

ADD REPLY • link 11.4 years ago by Matt Shirley 10k

1

Entering edit mode

I switched references once in the GATK pipeline and was like walking on hot coals every step.

ADD REPLY • link 11.4 years ago by Zev.Kronenberg 12k