SHAPEIT using VCF unphased genotype input
1
0
Entering edit mode
3.0 years ago

I can get SHAPEIT to work with the default Plink PED/MAP format input files, but not with a VCF as input.

As an example, here I use the demo data that comes with SHAPEIT, which runs well.

DEMO=/Users/michaelflower/bin/shapeit.v2.904.3.10.0-693.11.6.el7.x86_64/example

shapeit -B $DEMO/gwas.bed $DEMO/gwas.bim $DEMO/gwas.fam \
-M $DEMO/genetic_map.txt \
-O "$DIR"/shapeit/gwas.phased

However, when I try and use the VCF file they provide it errors.

gunzip "$DIR"/shapeit/demo/gwas.vcf.gz

Initially when I run with --input-vcf, like this:

shapeit --input-vcf $DIR/shapeit/demo/gwas.vcf \
M $DEMO/genetic_map.txt \
-O "$DIR"/shapeit/gwas.phased

I get the error:

Phaser mode : unrecognised option '--input-vcf'

And when I try with the abbreviated format:

shapeit -V $DIR/shapeit/demo/gwas.vcf \
M $DEMO/genetic_map.txt \
-O "$DIR"/shapeit/gwas.phased

I just get this printed in the terminal, but no output files are produced in the output directory.

Segmented HAPlotype Estimation & Imputation Tool
  * Authors : Olivier DELANEAU, Jean-François ZAGURY & Jonathan MARCHINI
  * Contact : olivier.delaneau@gmail.com
  * Webpage : http:://www.shapeit.fr
  * Version : v2.r648

I'd be very grateful for a little help to get this working, thanks

VCF SHAPEIT • 1.7k views
ADD COMMENT
0
Entering edit mode

You should strongly consider using shapeit4 - shapeit2 is about 10 years old now.

ADD REPLY
0
Entering edit mode

Thanks, I'm trying to install shapeit4 with conda, but am getting:

PackagesNotFoundError: The following packages are not available from current channels:
  - shapeit4
ADD REPLY
3
Entering edit mode
3.0 years ago

I managed to solve this by converting the VCF to plink format

#=================================================================
# Convert VCF to plink format
#=================================================================
# https://www.biostars.org/p/207388/
# https://www.cog-genomics.org/plink2/data#recode

# Install plink
#conda install -c bioconda plink
conda create -n plink -c conda-forge -c bioconda plink

# Enter plink environment
conda activate plink

# Set VCF shortcut
VCF="$DIR"/wgs/130iPSC_061118.snp.vcf.gz

# Convert to plink binary format (bed, bim, fam)
plink --vcf "$VCF" --out "$DIR"/plink/$PREFIX

# Convert to plink ped format (ped, map)
plink --vcf "$VCF" --recode --out "$DIR"/plink/ped/$PREFIX



# For shapeit each "chromosome" needs to have its own input file
# ShapeIT can just phase one chromosome at a time
# https://bioinformatics.stackexchange.com/questions/2883/problem-of-ordering-in-physical-positions-phasing-snps-with-shapeit

for chr in $(seq 1 22) ; do plink --file "$DIR"/plink/ped/$PREFIX --chr $chr --recode --out "$DIR"/plink/ped/$PREFIX"_chr"$chr ; done

# Exit plink environment
conda deactivate
ADD COMMENT

Login before adding your answer.

Traffic: 1613 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6