SNP count in a set of genomic regions
2
0
Entering edit mode
3.7 years ago
gokberk ▴ 90

Hi all,

I have a set of genomic regions as .bed files. I need to get the number of 1000G phase 3 SNPs which fall within these regions. I use the code below to do this and also to make .bim files:

for i in $bed/*.bed; do
        tmp_bed=$(basename "$i")

        plink --bfile ${plinkDir}/1000G.EUR.QC \
                --make-just-bim --allow-no-sex \
                --extract range "${i}" --out ${outDir}/1000G.EUR.QC_allchr.${tmp_bed};
done

Log file looks as below:

Random number seed: 1619784243
225648 MB RAM detected; reserving 112824 MB for main workspace.
9997231 variants loaded from .bim file.
489 people (0 males, 0 females, 489 ambiguous) loaded from .fam.
Ambiguous sex IDs written to
/data/1000G.EUR.QC_allchr.annotation.bed.nosex
.
--extract range: 9949062 variants excluded.
--extract range: 48169 variants remaining.
Using 1 thread (no multit hreaded calculations invoked).
Before main variant filters, 489 founders and 0 nonfounders present.
Calculating allele frequencies... done.
48169 variants and 489 people pass filters and QC.
Note: No phenotypes present.
--make-just-bim to
/data/1000G.EUR.QC_allchr.annotation.bed.bim
... done.

I was wondering if this is the right way to do it and if I can conclude that 48,169 SNPs were found within my annotation.

Thanks!

annotations SNPs 1000G • 1.1k views
ADD COMMENT
1
Entering edit mode
3.7 years ago

For comparison, perhaps:

$ bedmap --echo --count --delim '\t' <(sort-bed regions.bed) <(vcf2bed < variants.vcf) > answer.bed

That will give the number of VCF records found within each region.

Reference:

ADD COMMENT
1
Entering edit mode
3.7 years ago

You can count sites querying phase 3 sites directly, although if you have lots of bed files to query it could be better to download the 1000g sites file and run the loop locally:

sites="ALL.wgs.phase3_shapeit2_mvncall_integrated_v5c.20130502.sites.vcf.gz" # in case you download the file; otherwise, use the full URL
sites="https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5c.20130502.sites.vcf.gz"
for i in $bed/*.bed; do
  echo -n "$i "
  bcfools view -HR $i $sites | wc -l
done
ADD COMMENT

Login before adding your answer.

Traffic: 1679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6