Picard LiftOver rejecting all Variants
2
1
Entering edit mode
2.0 years ago
nanodano ▴ 30

Hi everyone,

Pretty new you lifting over data here, many thanks for your help!

I have a dataset I'd like to liftover to HG19, rather than it's current HG38 alignment. I'm trying to use picard liftover, but no matter what I do all the variants get rejected.

This is the current script I'm running:

INPUT="/share/hennlab/projects/sa_ponderosa/03-PHASE/khomani_liftover/hg38/khomani100_hg38" 
CHAIN="/share/hennlab/reference/liftover_references/hg38ToHg19.over.chain" 
OUTPUT="/share/hennlab/projects/sa_ponderosa/03-PHASE/khomani_liftover/hg19"

java -jar /share/hennlab/projects/sa_ponderosa/my_scripts/picard.jar LiftoverVcf \
I=${INPUT}.vcf \
O=${OUTPUT}/khomani100_hg19lifted.vcf.gz \
CHAIN=${CHAIN} \
REJECT=${OUTPUT}/rejected_khomani100_hg19lifted_variants.vcf \
MAX_RECORDS_IN_RAM=100000 \
WARN_ON_MISSING_CONTIG=true \
RECOVER_SWAPPED_REF_ALT=true \
REFERENCE_SEQUENCE=/share/hennlab/projects/sa_ponderosa/03-PHASE/khomani_liftover/hg19.nochr.fa \
> ${OUTPUT}/khomani100_hg19lifted.stdout 2> ${OUTPUT}/khomani100_hg19lifted.stderr

The hg19.nochr.fa file does have a corresponding .dict file in that directory as well, generated with picard CreateSequenceDictionary. It looks likes this:

>1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

I removed the "chr" from throughout the file to see if that was causing the problem and it still resulted in the same outcome, all rejected variants.

Am I missing something obvious? Any help is appreciated!

hg19 picard liftover • 3.5k views
ADD COMMENT
0
Entering edit mode

Could you post your stdout and stderr files too?

ADD REPLY
4
Entering edit mode
2.0 years ago

Most likely the issue is that your VCF has contigs named 1,2,...,22 while your chain file has contigs named chr1,chr2,...,chr22, which is a typical issue with Picard LiftoverVcf. I would advise instead to use BCFtools +liftover from here which: (i) is flexible with contig names and (ii) correctly handles indel records

ADD COMMENT
2
Entering edit mode
2.0 years ago
nanodano ▴ 30

Wow, thank you! That was exactly the problem. For anyone else that comes across this: You can use @freeseek's BCFtools +liftover suggestion, or I also resolved it by using BCFtools +annotate (see below).

module load bcftools
# Create a table for name conversion 
# col1=old name; col2=new name; In this case col2 has "chr" prefixes to chromosome numbers

NameConversionTable='NameConversion.txt' 

bcftools annotate --rename-chrs $NameConversionTable ${INPUT}.vcf > ${INPUT}.chrnamed.vcf

java -jar /share/hennlab/projects/sa_ponderosa/my_scripts/picard.jar LiftoverVcf \
I=${INPUT}.chrnamed.vcf \
O=${OUTPUT}/khomani100_hg19lifted.vcf \
CHAIN=${CHAIN} \
REJECT=${OUTPUT}/khomani100_hg19lifted_variants_rejected.vcf \
MAX_RECORDS_IN_RAM=100000 \
WARN_ON_MISSING_CONTIG=true \
RECOVER_SWAPPED_REF_ALT=true \
REFERENCE_SEQUENCE=${GENOME_hg19} \
> ${OUTPUT}/khomani100_hg19lifted.stdout 2> ${OUTPUT}/khomani100_hg19lifted.stderr

That lifted all but 2 variants!

Also these two lines are super important

MAX_RECORDS_IN_RAM=100000 \
RECOVER_SWAPPED_REF_ALT=true \

The first will make sure that it doesn't crash because it tried to use too much memory. The second is good if the files have been in plink at some point because the REF and ALT alleles may have been reversed and this will fix that. Which is true in my case, a portion of variants were swapped. Otherwise it just fails them without correction.

ADD COMMENT
2
Entering edit mode

Also, do notice that if you convert an indel VCF record to PLINK without the option --keep-allele-order you might irreversibly lose information about which allele is the reference allele. This cannot then be recovered unless you know whether the reference allele is the major allele or the minor allele. As an example, these two GRCh37 variants are different but would look like the same variant to PLINK:

20 31022441 . A AG
20 31022441 . AG A

The RECOVER_SWAPPED_REF_ALT=true option in Picard LiftoverVcf only works for SNPs and does not work for indels. Indeed Picard LiftoverVcf will process indels to a new reference while failing to swap the reference and alternate alleles when needed, leading to corrupt genotype data for a subset of the indels in a VCF.

ADD REPLY

Login before adding your answer.

Traffic: 1435 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6