Post does not exist.
Lift over of GWAS summary stat file from Hg38 to Hg19
2
1
Entering edit mode
3.8 years ago
AVA ▴ 40

Hello,

I am using the UCSC lift over tool and the associated chain to lift over the results of my GWAS summary statistic file (a tab separated file) from build 38 to build 37. The GWAS summary stat file looks like:

1 chr1_17626_G_A 17626 A G 0.016 -0.0332 0.0237 0.161
1 chr_20184_G_A  20184 A G 0.113 -0.185  0.023 0.419

Follwing is the UCSC tool with the associated chain I am using:

I want to create a file in bed format from GWAS summary stat fle that is the required input by the tool, where I would like the first three columns to be tab separated and rest of the columns to be merged in a single column and separated by a non tab separator such as "." so as to preserve them while running the lift over. The first three columns of the input bed file would be:

awk '{print chr$1, $3-1, $3}' GWAS summary stat file > ucsc.input.file

#$1 = chrx - where x is chromosome number 
#$2  position -1  for SNPs
#$3  bp position hg38 for SNPs

The above three are the required columns for the tool.

My questions are:

  1. How can I use a non tab separator say ":" to merge rest of the columns of the GWAS summary stat file in one column?
  2. After running the liftover, how can I unpack the columns separated by :?
Hg38 linux Liftover GWAS Hg19 • 3.2k views
ADD COMMENT
2
Entering edit mode
4 days ago

As explained here I would strongly advise to avoid UCSC liftover to change reference for summary statistics. You can use BCFtools/munge to convert the summary statistics to GWAS-VCF format and then BCFtools/liftover to change reference genome. This will handle correctly reference and alternate alleles swaps, strand flips, and indels. You can find BCFtools/munge and BCFtools/liftover here. If you prefer using UCSC liftover instead, I would at least remove all indels first, to make sure you do not create biases downstream

ADD COMMENT
0
Entering edit mode

Thanks a lot for the recommendation to use bcftools +munge and +liftover instead of UCSC liftOver — I’ll definitely try that!

ADD REPLY
0
Entering edit mode
2.2 years ago
arturtjaro ▴ 40

There's a simpler and cleaner solution.

  1. Make the bed file as normal using awk, and carry over all the columns ($0 here):
    awk 'BEGIN {OFS="\t"} {print chr$1, $3-1, $3, $0}' GWAS_summary_stat_file > ucsc.input.bed
    
  2. Lift over using the -bedPlus=3 option. This does the liftover based on the first three columns, and carries over all the remaining columns for the ride:
    liftOver -bedPlus=3 -tab ucsc.input.bed hg38ToHg19.over.chain ucsc.output.bed ucsc.unmapped.bed
    
ADD COMMENT
0
Entering edit mode

Quick question: Do we need to take anything special into account when lifting over indels, or is your approach (using awk 'BEGIN {OFS="\t"} {print chr$1, $3-1, $3, $0}') also reliable for them?

Since the GWAS summary statistics don’t provide an explicit end position, should I adjust the end coordinate based on the indel length — or is it fine to treat all variants as 1 bp for the liftover?

ADD REPLY

Login before adding your answer.

Traffic: 2240 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6