Question

Finding genome build of bed, bim, fam

0

Entering edit mode

2.5 years ago

Karan ▴ 10

Hi, all -

Wanted to ask how I'd be able to find the genome build of a PLINK file? I only have the data in BED, BIM, and FAM format, so wanted to see how I'd be able to find its genome build to ensure its aligns before I merge it with another dataset. Thanks!

genome • 2.1k views

ADD COMMENT • link updated 2.5 years ago by Ram 45k • written 2.5 years ago by Karan ▴ 10

score 1 · Answer 1 · 2023-02-19

1

Entering edit mode

2.5 years ago

Istvan Albert 103k

There is no direct encoding of the genome build, but you can work around sometimes,

The first column in the BIM file should list the chromosome number and the second column should list the SNP ID. If the SNP IDs start with "rs", then you can look up the SNP on a database like dbSNP to determine its genomic location and corresponding genome build.

In the same line of thinking, there should be overlaps between calls in the same build, and few overlaps between two data if the builds differ.

ADD COMMENT • link 2.5 years ago by Istvan Albert 103k

0

Entering edit mode

Thanks so much for that! A few follow-up questions:

1) One of the SNP IDs is rs1839669. When I search that up on dbSNP, I'm lead to the following page: https://www.ncbi.nlm.nih.gov/snp/rs1839669#variant_details. Here when I navigate to "Variant Details" page, I find two options for my build: GRCh37.p13 chr 2 or GRCh38.p14 chr 2. However, how do I now determine which one it is? My base-pair coordinate for that ID is 98157865, which doesn't allign with either of the base-pair coordinates in the two options on dbSNP.

2) What happens if the SNP ID doesn't start with "rs"? For instance, for one of my files, the SNP ID is structured as follows: SNP_A-2242008. How would I determine the genomic build then?

Thanks so much for all your help!

ADD REPLY • link 2.5 years ago by Karan ▴ 10

0

Entering edit mode

maybe look at another SNP :-)

for a more systematic search turn the bim file into VCF then load it up with dbSNP files for each build, and view it that way in IGV

plink --bfile input --recode vcf --out output

I think it ought to be clear then what is what.

perhaps your build is an even earlier one ...

ADD REPLY • link 2.5 years ago by Istvan Albert 103k