Normal number of variants to lose during liftover: GRCh38 to hg19
0
0
Entering edit mode
10 months ago

I am using the 1000 genome files, provided by the plink 2 author, and using liftover to convert the positions to hg19. To do this, I am first converting to a VCF file, sorting with bcftools, and then using CrossMap to perform the liftover. Of 70,692,015 (only chr1-22 and XY included), 16,559,055 failed to map.

Is this to be expected? Or is something suspect with my pipeline?

liftover 1000genome CrossMap • 939 views
ADD COMMENT
0
Entering edit mode

The liftover tool should provide a log about the variant that failed the process

ADD REPLY
0
Entering edit mode

It did and nothing stood out with a quick look. I am curious about other people's experience with liftover and the expected loss in variants from GRCh38 to hg19.

ADD REPLY
0
Entering edit mode

how do you liftover ?

ADD REPLY
0
Entering edit mode

I used CrossMap to convert the VCF file, which I generated from plink2's 1000-genome files.

https://crossmap.sourceforge.net/#convert-vcf-format-files

As this program uses the reference genome (not the overchain), maybe this isn't appropriate. I am going to attempt using the overchain and merely update the positions.

ADD REPLY
0
Entering edit mode

You should definitely provide the chain file AND the reference genome to CrossMap, how else is CrossMap supposed the know how to lift the variants?

ADD REPLY
0
Entering edit mode

Using a version of the high coverage 1000 Genomes project callset with 63,993,411 non-singleton bi-allelic SNVs and 9,459,059 non-singleton bi-allelic together indels with the hg38ToHg19.over.chain.gz UCSC chain file, I get 916,020 SNVs and 63,685 indels dropped using CrossMap/VCF while I get 872,258 SNVs and 55,590 indels dropped using BCFtools/liftover so what you are seeing is not expected

ADD REPLY

Login before adding your answer.

Traffic: 2621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6