Please I need directive on how to download the phase3 1000 genome of African population
Please I need directive on how to download the phase3 1000 genome of African population
Hey brendaumoh6,
If you follow steps 1-5 of my tutorial ( Produce PCA bi-plot for 1000 Genomes Phase III - Version 2 ), you will have the entire phased 1000 Genomes Phase III dataset on your disk, which can be used time and time again for future analyses. Information about the African population will be in the PED file that you also download - this can be used to filter the data for just the African samples.
Unfortunately, I am not aware of anybody who has split the 1000 Genomes data into the individual population groups. It's likely something that I would do if I actually had a tenured academic position.
Kevin
As others have noted, the primary-source way to do this is to use a pedigree file provided by 1000 Genomes to filter the full dataset down to just the African samples of interest (which correspond to a superpopulation of "AFR").
A quick alternative is to use the plink2-format fileset posted at https://www.cog-genomics.org/plink/2.0/resources#1kg_phase3 . This includes SuperPop and Population annotations for each sample, so the following command line extracts just the African samples (assuming the .pvar file is still compressed, that's what the 'vzs' refers to):
plink2 --pfile all_phase3 vzs \
--keep-cat-pheno SuperPop \
--keep-cat-names AFR \
--make-pgen \
--out afr_phase3
and you can convert to BCF format with
plink2 --pfile afr_phase3 \
--export bcf
Thanks for your respond. I have downloaded the phase3_corrected.psam\?dl\=1 file from plink2 website. I ran the command line :
plink2 --pfile all_phase3 vzs \
--keep-cat-pheno SuperPop \
--keep-cat-names AFR \
--make-pgen \
--out afr_phase3
But I got a debug message:
Start time: Wed Apr 29 11:14:10 2020
193440 MiB RAM detected; reserving 96720 MiB for main workspace.
Using up to 16 threads (change this with --threads).
Error: Failed to open all_phase3.pvar.zst?dl=1.pgen : No such file or
directory.
How do I resolve this is issue?
Same error message output but this time no such directory ".pgen"
Start time: Wed Apr 29 14:59:09 2020
193440 MiB RAM detected; reserving 96720 MiB for main workspace.
Using up to 16 threads (change this with --threads).
Error: Failed to open all_phase3.pgen : No such file or directory.
End time: Wed Apr 29 14:59:09 2020
Out of curiosity, what browser are you using on what operating system, and how are you clicking on the links to download the files? When I click on the links with either Chrome, Firefox, or Safari, across multiple computers, the saved files do not have "?dl=1" at the end of the names.
It seems that it was likely wget. I just tried via wget and it saves it as per the user reported:
wget https://www.dropbox.com/s/qv61mgtx6pz54fz/chr1_phase3.pgen.zst?dl=1
Works via the browser though.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Did you take a look at the FAQ provided by 1000 genomes project?
Yes,I did but all I saw was values, I dont really know which is for which population.
https://www.internationalgenome.org/faq/can-i-get-genotypes-specific-individualpopulation-your-vcf-files/