Hi all,
I have a set of genomic regions as .bed files. I need to get the number of 1000G phase 3 SNPs which fall within these regions. I use the code below to do this and also to make .bim files:
for i in $bed/*.bed; do
tmp_bed=$(basename "$i")
plink --bfile ${plinkDir}/1000G.EUR.QC \
--make-just-bim --allow-no-sex \
--extract range "${i}" --out ${outDir}/1000G.EUR.QC_allchr.${tmp_bed};
done
Log file looks as below:
Random number seed: 1619784243
225648 MB RAM detected; reserving 112824 MB for main workspace.
9997231 variants loaded from .bim file.
489 people (0 males, 0 females, 489 ambiguous) loaded from .fam.
Ambiguous sex IDs written to
/data/1000G.EUR.QC_allchr.annotation.bed.nosex
.
--extract range: 9949062 variants excluded.
--extract range: 48169 variants remaining.
Using 1 thread (no multit hreaded calculations invoked).
Before main variant filters, 489 founders and 0 nonfounders present.
Calculating allele frequencies... done.
48169 variants and 489 people pass filters and QC.
Note: No phenotypes present.
--make-just-bim to
/data/1000G.EUR.QC_allchr.annotation.bed.bim
... done.
I was wondering if this is the right way to do it and if I can conclude that 48,169 SNPs were found within my annotation.
Thanks!