My final objective is to get the "Minor Allele Frequencies" (MAF) for all the 1000 Genomes SNPs (in H. sapiens GRCh37 in case you ask). I specifically need to obtain data referent to the low coverage Phase 1 of the project, as I require unbiased low coverage data for a machine learning model.
I have the 1000 Genomes vcf and I'm attempting to install both VEP 86 and vcf2maf for obtaining the data i need. The reason I wish to install VEP 86 (instead of the current version, 89) is because vcf2maf requires the archive version of VEP, I don't know how to make it work with the latest VEP version.
As pointed by this previous question www.biostars.org/p/123822/) I'm following the instructions from this link to get vcf2maf installed: vcf2maf
which points also to this VEP installation instructions: VEP
I successfully installed perl 5.22 in the path require by VEP, as described in this link bellow. This step is done. perl
I'm currently stuck at the following step of the VEP installation (again, see VEP ):
Download and unpack VEP's offline cache for GRCh37, GRCh38, and GRCm38:
> rsync -zvh rsync://ftp.ensembl.org/ensembl/pub/release-86/variation/VEP/homo_sapiens_vep_86_GRCh{37,38}.tar.gz $VEP_DATA
> rsync -zvh rsync://ftp.ensembl.org/ensembl/pub/release-86/variation/VEP/mus_musculus_vep_86_GRCm38.tar.gz $VEP_DATA
> cat $VEP_DATA/*_vep_86_GRC{h37,h38,m38}.tar.gz | tar -izxf - -C $VEP_DATA
I know the path given in the instructions is wrong. When I try it the code runs but hangs forever:
ftp.ensembl.org/ensembl/pub/release-86/variationVEP/homo_sapiens_vep_86_GRCh37.tar.gz
The current right path is bellow. Notice that I'm only interested in human GRCh37:
ftp.ensembl.org/pub/release-86/variation/VEP/homo_sapiens_vep_86_GRCh37.tar.gz
When I attempt to correct the line I get:
> rsync -zvh rsync://ftp.ensembl.org/pub/release-86/variation/VEP/homo_sapiens_vep_86_GRCh37.tar.gz $VEP_DATA
@ERROR: Unknown module 'pub'
rsync error: error starting client-server protocol (code 5) at main.c(1653) [Receiver=3.1.1]
sergio-bioinfo@sergiobioinfo-Latitude-3540:~/vep$ rsync -zvh rsync://ftp.ensembl.org/pub/release-86/variation/VEP/homo_sapiens_vep_86_GRCh37.tar.gz $VEP_DATA
@ERROR: Unknown module 'pub'
rsync error: error starting client-server protocol (code 5) at main.c(1653) [Receiver=3.1.1]
I don't know how to work around this problem. How can I fix this and follow the instructions correctly to get VEP and vcf2maf work together?
So true! This also conflicts with Multiple Alignment Format
And Minor Allele Frequency!