I have done my exome alignment using hg19 release (Data downloaded from here, hg19.2bit ). I have also included the chrUn and random data in my reference sequence following the discussion here. I followed the steps suggested by David and lh3 here now generated my vcf file using GATK UnifiedGenotyper (step 10). In my current output file I don't have dbsnp IDs.
Then I tried this step that included the dbsnp vcf format
java -jar /software/GenomeAnalysisTK.jar -R /data/hg19/hg19.fa -T UnifiedGenotyper -I FOO.bam -B:dbsnp,VCF /data/dbsnp132/00-All.vcf -o Foo_raw.vcf
Here am getting the following error
<h5>ERROR MESSAGE: Input files reads and reference have incompatible</h5>contigs: Order of contigs differences, which is unsafe.
The error is obvious here, I have CHROM field of vcf file in 1, 2...22, X, Y, M, PAR format where as my reference genome is in the format chr1, chr2... chr22, chrX, chrY, chrM followed by chrUn and random chr. I can modify the fasta headers to fix this issue with Chromosomes, X, Y and M.
Before venturing into my re-alignment from the begining, I would like to know whether the VCF file with chrUn and random data in my reference genome may create any further errors. Also my reference genome don't have PAR which is in the VCF file.
Do you have any suggestion on how to deal with additional data in hg19 which is not dbSNP vcf file and PAR in dbSNP vcf file which is not in my reference genome ?
Is there any alternate hg19/GRCh37 assembly with corresponding dbSNP 132 in VCF format that I can use for my exome analysis ?
The NIH link appears to be broken.
The current link is: ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz
Thanks a lot Brad. I didn't know about this hg19 assembly from Broad.