Hi there,
I have an old bed and vcf files that were created using the genome reference b36. Is there any way to update the genome coordinates to GRCh37/hg19?
Many thanks in advance, Anna
Hi there,
I have an old bed and vcf files that were created using the genome reference b36. Is there any way to update the genome coordinates to GRCh37/hg19?
Many thanks in advance, Anna
NCBI remap service can be used for this. Specifically, you can use this link with the combination of source (NCBI36/hg18) and target (GRCh37/hg19) assemblies selected.
I have also tried this link. But when uploading the file to the Michigan Imputation Server, it gives the following error:
Unable to parse header with error: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
and my vcf does have the #CHROM header:
line 44: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 4023 (...)
line 45: 10 111955 rs7909677 A G . PASS PR;REMAP_ALIGN=FP GT 0/0 (...)
I uploaded the file. Can you please tell me how to upload the file here?
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20220228
##source=PLINKv1.90
##contig=<ID=0,length=2147483645>
##contig=<ID=1,length=247137335>
##contig=<ID=2,length=242697434>
##contig=<ID=3,length=199340831>
##contig=<ID=4,length=191167889>
##contig=<ID=5,length=180625440>
##contig=<ID=6,length=170747903>
##contig=<ID=7,length=158809727>
##contig=<ID=8,length=146264219>
##contig=<ID=9,length=140191297>
##contig=<ID=10,length=135237858>
##contig=<ID=11,length=134449983>
##contig=<ID=12,length=132209175>
##contig=<ID=13,length=114125099>
##contig=<ID=14,length=106356483>
##contig=<ID=15,length=100217561>
##contig=<ID=16,length=88690777>
##contig=<ID=17,length=78643089>
##contig=<ID=18,length=76116030>
##contig=<ID=19,length=63786939>
##contig=<ID=20,length=62382908>
##contig=<ID=21,length=46909249>
##contig=<ID=22,length=49565873>
##contig=<ID=23,length=154578240>
##contig=<ID=24,length=27167582>
##contig=<ID=25,length=154881767>
##contig=<ID=26,length=15609>
##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##bcftools_filterVersion=1.10.2+htslib-1.10.2
##bcftools_filterCommand=filter -r 10 EPI.vcf.gz; Date=Mon Feb 28 12:17:19 2022
##INFO=<ID=REMAP_ALIGN,Number=1,Type=String,Description="Alignment type used for remapping (FP=first pass, SP=second pass)">
##INFO=<ID=REF_EDIT,Number=0,Type=Flag,Description="REF base modified during remapping due to either left shifting or difference in REF base between source and target assemblies.">
##NCBI_remap_source_assm="GCF_000001405.12"
##NCBI_remap_target_assm="GCF_000001405.13"
##NCBI_remap_align_date="2014-09-23 20:19:00"
##NCBI_remap_run_date="2022-02-28T08:24:12"
##NCBI_remap_batch_id="86373"
##NCBI_remap_align_parameters=<minratio=0.5,maxratio=2,multiloc=Y,mergefrag=N>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 4023
10 111955 rs7909677 A G . PASS PR;REMAP_ALIGN=FP GT 0/0
Is it ok like this, or should I upload the file another way?
While doing that, I noticed that at the end of the file, I had a few lines with other chromosomes (eg, HSCHRUN_RANDOM_CTG15). After removing them, the imputation worked.
Again, my original file with only chr11 had 33.421 lines and my remapped file has 12.055. Is it expectable to lose that many variants?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
ucsc liftover and picard liftovervcf
I tried with picard liftovervcf, but it gives me an empty file (it only has the header lines), while all variants go to the rejected_variants.vcf
java -jar picard.jar LiftoverVcf I=chr11.vcf O=chr11b.vcf CHAIN=hg18ToHg19.over.chain REJECT=rejected_variants.vcf R=ucsc.hg19.fasta
check the chromosome nomenclature (chr1 vs 1)
Yes, that was it. Silly mistake sorry
Still, the output file has 237.905 lines and the rejected_variants file has 436.954 lines, which seems a lot. Is it expectable?
It also gives me this warning: WARNING LiftoverVcf 137518 variants with a swapped REF/ALT were identified, but were not recovered.