Hello
I'm performing my first GWAS and I am trying to plot the principal components of my data by anchoring them to a reference data by ancestry. However, I'm having a very hard time finding a reference on build 38 with individual-level data, rsIDs, and ancestry.
I have only been able to access older versions of reference data, and when I have tried lifting over to hg38 I have to create a readable "txt" bed file however I'm having a hard time converting this bed txt file back to a binary file which is needed for the plink PCA commands.
Does anyone know how I could access this data from the 1000 genomes project in GRCh38 build? Preferably in PLINK or VCF format. This would make my pipeline much more streamlined.
Not sure if this is what you need. GATK resource bundle includes 1000Genomes phase 3 data in VCF format. Check google cloud bucket link included in their help page here --> https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle