Convert files made for b37 into hg19 human reference?
2
1
Entering edit mode
16 months ago

Altough hg19 and b37 being considered similar they have some differences that affect the pre processing when doing variant calling.

Is there any tool to convert the references for panel of normals or know-sites that are applied in Mutect for the several prepreprocessing steps from b37 into hg19?

I have read about liftover but I am not sure if it works for .vcf files suggested in https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle such as 1000G_phase1.indels.b37.vcf, Mills_and_1000G_gold_standard.indels.b37.sites.vcf or the Mutect2-WGS-panel-b37.vcf .

GATK Variant-Calling Mutect2 • 1.8k views
ADD COMMENT
1
Entering edit mode

you don't need those files for https://gatk.broadinstitute.org/hc/en-us/articles/360037060932-LiftoverVcf-Picard- you just need a VCF , a chain , and a reference.

 java -jar picard.jar LiftoverVcf \
     I=input.vcf \
     O=lifted_over.vcf \
     CHAIN=b37tohg38.chain \
     REJECT=rejected_variants.vcf \
     R=reference_sequence.fasta
ADD REPLY
1
Entering edit mode

What is b38? Do you mean to say GRCh38?

If that is the case they are only similar in the extent that they are both human genome builds. Other than that they are substantially different (which is the reason why they are "major" genome builds).

If you are looking for GRCh37/hg19 GATK resource files they should be available using the answer here: gatk legacy bundles (where to get Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz)

If you need the GATK resource files for GRCh38 they are available here: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/

ADD REPLY
0
Entering edit mode

I edited post, I meant b37. I have read the gatk legacy bundles (where to get Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz) it came to the conclusion that the b37 files are not compatible with hg19 therefore the only way is to convert the b37 resources into hg19 format, as Pierre and Raphael mentioned?

ADD REPLY
0
Entering edit mode

There is no difference in the main chromosomes between b37 and hg19 (except Chr Y and MT, see table below) . Unless you are doing something very specific, it may be reasonable to use hg19 files.

See: https://gatk.broadinstitute.org/hc/en-us/articles/360035890711-GRCh37-hg19-b37-humanG1Kv37-Human-Reference-Discrepancies#comparison

ADD REPLY
0
Entering edit mode
 A USER ERROR has occurred: Input files reference and features have incompatible contigs: No 
  overlapping contigs found.
  reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr13, 
  chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21, chr6_ssto_hap7, chr6_mcf_hap5, 
  chr6_cox_hap2, chr6_mann_hap4, chr6_apd_hap1, chr6_qbl_hap6, chr6_dbb_hap3, chr17_ctg5_hap1, 
  chr4_ctg9_hap1, chr1_gl000192_random, chrUn_gl000225, chr4_gl000194_random, 
  chr4_gl000193_random, chr9_gl000200_random, chrUn_gl000222, chrUn_gl000212, 
  chr7_gl000195_random, chrUn_gl000223, chrUn_gl000224, chrUn_gl000219, chr17_gl000205_random, 
  chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chr9_gl000199_random, chrUn_gl000211, 
  chrUn_gl000213, chrUn_gl000220, chrUn_gl000218, chr19_gl000209_random, chrUn_gl000221, 
  chrUn_gl000214, chrUn_gl000228, chrUn_gl000227, chr1_gl000191_random, chr19_gl000208_random, 
  chr9_gl000198_random, chr17_gl000204_random, chrUn_gl000233, chrUn_gl000237, chrUn_gl000230, 
  chrUn_gl000242, chrUn_gl000243, chrUn_gl000241, chrUn_gl000236, chrUn_gl000240, 
  chr17_gl000206_random, chrUn_gl000232, chrUn_gl000234, chr11_gl000202_random, chrUn_gl000238, 
 chrUn_gl000244, chrUn_gl000248, chr8_gl000196_random, chrUn_gl000249, chrUn_gl000246, 
  chr17_gl000203_random, chr8_gl000197_random, chrUn_gl000245, chrUn_gl000247, 
   chr9_gl000201_random, chrUn_gl000235, chrUn_gl000239, chr21_gl000210_random, chrUn_gl000231, 
  chrUn_gl000229, chrM, chrUn_gl000226, chr18_gl000207_random]
  features contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, 
  GL000207.1, GL000226.1, GL000229.1, GL000231.1, GL000210.1, GL000239.1, GL000235.1, 
  GL000201.1, GL000247.1, GL000245.1, GL000197.1, GL000203.1, GL000246.1, GL000249.1, 
  GL000196.1, GL000248.1, GL000244.1, GL000238.1, GL000202.1, GL000234.1, GL000232.1,  
  GL000206.1, GL000240.1, GL000236.1, GL000241.1, GL000243.1, GL000242.1, GL000230.1, 
  GL000237.1, GL000233.1, GL000204.1, GL000198.1, GL000208.1, GL000191.1, GL000227.1, 
  GL000228.1, GL000214.1, GL000221.1, GL000209.1, GL000218.1, GL000220.1, GL000213.1, 
   GL000211.1, GL000199.1, GL000217.1, GL000216.1, GL000215.1, GL000205.1, GL000219.1, 
   GL000224.1, GL000223.1, GL000195.1, GL000212.1, GL000222.1, GL000200.1, GL000193.1, 
    GL000194.1, GL000225.1, GL000192.1, NC_007605]

I understand however having the b37 version would make these two lists the same size and also fix the chr1 to 1 issue (which I understand can be just solved by converting) but, again, even if converting the sizes of lists would be different due to the differences you mentioned (https://gatk.broadinstitute.org/hc/en-us/articles/360035890711-GRCh37-hg19-b37-humanG1Kv37-Human-Reference-Discrepancies#comparison - Reference Table). Therefore since there are no available files for hg19 the only reasonable option is to convert b37 files to hg19?

ADD REPLY
0
Entering edit mode

try crossmap https://crossmap.sourceforge.net/ it works for vcf files

ADD REPLY
2
Entering edit mode
16 months ago
raphael.B ▴ 520

You can use picard LiftoverVcf

java -jar picard.jar LiftoverVcf \
  I=input.vcf \
  O=lifted_over.vcf \ 
  CHAIN=hg38Tohg19.over.chain.txt \ 
  REJECT=rejected_variants.vcf \ 
  R=reference_sequence.fasta

you can download the chain file here

ADD COMMENT
0
Entering edit mode
16 months ago
Zhenyu Zhang ★ 1.2k

One general thing working with data/tools from the Broad: pay attention to contig names b/c they are often 1/2/3... without "chr" prefix.

ADD COMMENT

Login before adding your answer.

Traffic: 1771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6