Why is UCSC lift over failing to convert a specific position?
3
1
Entering edit mode
5 months ago

I tried converting chr1:27755669-27755669 from GRCh37 to GRCh38 using https://genome.ucsc.edu/cgi-bin/hgLiftOver. This corresponds to SNP rs2474293

However it fails to convert it. What is going on? is this a bug?

It should have a corresponding coordinate in GRCh38 https://www.ncbi.nlm.nih.gov/snp/?term=rs2474293

I know I can just take it from here, problem is that I am dealing with thousands of SNPs not just this one. And I'm afraid I will encounter a similar error in the rest of my files.

I first found this error using pyliftover and then when I checked on UCSC it gave the same, an error.

EDIT: this happens for several SNPs, not only this one. rs2236566 and rs12728058 are other examples.

UCSC liftover • 936 views
ADD COMMENT
5
Entering edit mode
5 months ago

UCSC liftover should not be used to liftover variants as it is designed for lifting over intervals. As @cmdcolin explained, there is a gap in the chain file, in part due to hg19 and hg38 representing different alleles for the very polymorphism you are trying to lift over. This is a common issue with many liftover tools as I explain in my own liftover paper. Using the BCFtools/liftover plugin this problem is automatically addressed:

echo -e  "##fileformat=VCFv4.2\n##contig=<ID=chr1,length=249250621>\n#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\nchr1\t27755669\trs2474293\tA\tC\t.\t.\t." | bcftools +liftover -- -s hg19.fa -f hg38.fa -c hg19ToHg38.over.chain.gz | bcftools view -H
Lines   total/swapped/reference added/rejected: 1/1/0/0
chr1    27429159    rs2474293   C   A   .   .   SWAP=1

In this case the region around the chain gap is automatically realigned and the correct location for the marker is then identified. The SWAP=1 INFO field explains that reference and alternate alleles were swapped

ADD COMMENT
1
Entering edit mode
5 months ago
cmdcolin ★ 4.0k

Note that NCBI has the data about rsID positions on both GRCh37 and GRCh38 in dbSNP. example https://www.ncbi.nlm.nih.gov/snp/rs2474293#variant_details

as far as why that particular position is failing to liftover, the liftover chain file has a completely replaced sequence there that might make it hard to properly covnert

see screenshot of jbrowse 2 visualization of the chain file at this position

and indeed, the NCBI shows the reference letter for the SNP changes

it's an A in GRCh37

GRCh37.p13 chr 1    NC_000001.10:g.27755669A>C
GRCh37.p13 chr 1    NC_000001.10:g.27755669A>G
GRCh37.p13 chr 1    NC_000001.10:g.27755669A>T

and a C in GRCh38

GRCh38.p14 chr 1    NC_000001.11:g.27429159C>A
GRCh38.p14 chr 1    NC_000001.11:g.27429159C>G
GRCh38.p14 chr 1    NC_000001.11:g.27429159C>T

jbrowse 2 screenshot of hg19 vs hg38

link to session https://jbrowse.org/code/jb2/v2.12.0/?config=test_data%2Fconfig_demo.json&session=share-IKMCQKNPp6&password=8Y8Wm

ADD COMMENT
0
Entering edit mode

random note: one could argue that the chain file is sort of inaccurate there. It could have been a 1bp SNP change (the A->C) and then a 1bp insertion (the ACT->ACTT). I don't know exactly how liftover internals work there, ucsc team may know more

ADD REPLY
0
Entering edit mode

and with the other ones, it looks like https://www.ncbi.nlm.nih.gov/snp/rs2236566#variant_details is outdated and it only exists on an older assembly (NC_000001.9 is chr1 from a older assembly, NC_000001.10 is chr1 on GRCh37) and then the third one rs12728058 is in a very chaotic region of the genome (subtelomere)...this is a zoomed out view of that. your SNP is in the middle of there, so even though NCBI confidently lists where it is on GRCh38, it is maybe difficult for liftover to do so (NCBI uses flanking sequences to help https://www.ncbi.nlm.nih.gov/snp/rs12728058)

enter image description here

https://jbrowse.org/code/jb2/v2.12.0/?config=test_data%2Fconfig_demo.json&session=share-nCBvHC3odD&password=ib20C

ADD REPLY
0
Entering edit mode
5 months ago

Did you see this note on the UCSC liftOver start page? "We do not recommend liftOver for SNPs that have rsIDs. See our FAQ for more information." - Did you have a look at the FAQ?

ADD COMMENT
1
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2058 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6