I successfully ran LD pruning from SNPRelate package and obtained the selected SNPs IDs. I would like to create a new GDS file with only the pruned SNPs so I can use it in another software. Codes:
#read in vcf file
vcf.fn <- ""C:/Users/HP/Desktop/Magic_FavoritePanel_VCF_SNPs.vcf"
# reformat to create gds file
snpgdsVCF2GDS(vcf.fn, "panelReformat.gds", method="biallelic.only")
snpgdsSummary("panelReformat.gds")
#open the GDS file
genofile <- snpgdsOpen("panelReformat.gds")
#Perfom LD-based SNP pruning
set.seed(1000)
snpset <- snpgdsLDpruning(genofile, ld.threshold=0.2)
names(snpset)
# Final output, get all selected snp id
snpset.id <- unlist(snpset)
Now, snpset.id
is not a complete dataset; how do I extract from the original GDS file only the data set related to the selected SNP IDs and save it as GDS file? With this, I can play with the data in many ways; I can convert it back to VCF and analyze it in Tassel or STRUCTURE.
Thank you in advance
You could subset your VCF
C:/Users/HP/Desktop/Magic_FavoritePanel_VCF_SNPs.vcf
to the SNPs you store insnpset.id
and then create the GDS again if you don't want to follow what James suggested.