Beagle files using the latest 1000 genomes
1
0
Entering edit mode
9.0 years ago

Hi,

I would like to get the latest beagle files from vcf files from phase 3 of the 1000 genomes data with 2504 unrelated individuals that is here:

http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes_phase3_v5a/, which uses these 1000 Genomes vcf files: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/

In particular I am trying to get something like what was available for the previous releases to create the files:

ALL.chr1.phase1_release_v2.20101123.filt.bgl.gz
ALL.chr16.phase1_release_v2.20101123.filt.tabix.gz
ALL.chr1.phase1_release_v2.20101123.filt.markers

Would I need to use the script here with the BEAGLE utilities?

https://data.broadinstitute.org/srlab/BEAGLE/1kG-beagle-release3/READ_ME_beagle_phase1_v3

Thank you so much for any advice about how to get these files in beagle, very very much appreciated...

1000Genomes • 3.7k views
ADD COMMENT
3
Entering edit mode
9.0 years ago
Kamil ★ 2.3k

Use the BEAGLE tools to change the file format. Here's an example that should get you started:

wget https://faculty.washington.edu/browning/beagle/bref.09Nov15.d2a.jar
wget http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes_phase3_v5a/individual_chromosomes/chr22.1kg.phase3.v5a.bref
java -jar bref.09Nov15.d2a.jar chr22.1kg.phase3.v5a.bref | gzip > chr22.1kg.phase3.v5a.vcf.gz
zcat chr22.1kg.phase3.v5a.vcf.gz | head -n6 | cut -c1-100 | grep -v '^#' | perl -ane 'print join("\t",@F[0..4]),"\t"; $i=0; foreach $G (@F[9..$#F]) { @A = split("\\|", $G, 2); print " " if $i++; print $F[3+$A[0]]," ",$F[3+$A[1]]; }; print "\n"'

Output

22    16050115    rs587755077    G    A    G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G
ADD COMMENT
0
Entering edit mode

Thank you very much Kamil!

Would this get me the same files as if I ran the script here then?

-- I am trying to get the .filt.bgl.gz, filt.tabix.gz, .filt.markers to be able to run EPIGWAS--

https://data.broadinstitute.org/srlab/BEAGLE/1kG-beagle-release3/READ_ME_beagle_phase1_v3

But using this version of the genome instead?

wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL*
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/phase1*
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/README*

Thanks again!!

ADD REPLY
0
Entering edit mode

For filtered variants, you might consider taking the files from the BEAGLE website instead of the 1000 Genomes website. The developer of BEAGLE filtered the variants from 1000 Genomes.

ADD REPLY

Login before adding your answer.

Traffic: 1226 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6