How to get ancestral SNPs states for GRCh37.p13
1
2
Entering edit mode
10.1 years ago
JMR ▴ 160

For a positive selection test that I want to use I need the ancestral states of all SNPs present on my data.

I checked this FAQ from NCBI, followed the instructions and downloaded a file that contain the rsnumber, physical position and ancestral state of over 60 million SNPs. However as a simple test, when I try to match some SNPs present in my data based on the rsnumber and physical position I didn't get any match. But when I entered the SNP on the dbSNP website I could find the SNP with a putative ancestral state with a matching physical position.

The last upadte from the downloaded file is March 2014, but I couldn't find a reference to the build.

Are there other places where I could get the ancestral states of SNPs? Or find an updated file from dbSNP?

Thank you in advance.

EXTRA INFORMATION FOR COMMENT

Example of an rsSNP in my data:

This is an rsSNP present in my data with its physical position based on the GRCh37 assembly.

rs2823639 17576565

When I check the SNPAncestralAllele.bcp.gz file for this rsSNP I get these matches:

rs2823639    0    A
rs2823639    1050982    A
rs2823639    1052591    A
rs2823639    1056295    A
rs2823639    1056571    A
rs2823639    1061835    A

The information on the dbSNP website is however this:

GRCh38            16204245
GRCh37.p13        17576565
Ancestral allele: A

The ancestral state is the same but the physical position is not.

snp • 4.7k views
ADD COMMENT
0
Entering edit mode

Can you post some sample rs#s from your dataset? Also what is the name of the file you downloaded?

Did you have a look at this instruction for getting ancestral SNP state?

http://www.ncbi.nlm.nih.gov/sites/books/NBK44409/#Build.how_do_i_download_a_flat_file_that

ADD REPLY
0
Entering edit mode

Yes I checked the instructions and downloaded two files: Allele.bcp.gz and SNPAncestralAllele.bcp.gz.

See edited answer to an example of an rs# of my sample.

Thanks for the help!

ADD REPLY
3
Entering edit mode

I am not sure we are seeing the same SNPAncestralAllele file.

The column definitions for the SNPAncestralAllele file from human_9606_table.sql is

CREATE TABLE [SNPAncestralAllele]
(
[snp_id] [int] NOT NULL ,
[ancestral_allele_id] [int] NOT NULL ,
[batch_id] [int] NOT NULL
)
GO

The second column in the table you posted is not chromosomal position but the batch_id

rs2823639    0    A

rs2823639    1050982    A

rs2823639    1052591    A

rs2823639    1056295    A

rs2823639    1056571    A

rs2823639    1061835    A

The chromosome position can be obtained from the b142_SNPChrPosOnRef_106.bcp file (for GRCh38). The column definitions for this file (again from human_9606_table.sql) is

CREATE TABLE [b142_SNPChrPosOnRef_106]
(
[snp_id] [int] NOT NULL ,
[chr] [varchar](32) NOT NULL ,
[pos] [int] NULL ,
[orien] [int] NULL ,
[neighbor_snp_list] [int] NULL ,
[isPAR] [varchar](1) NOT NULL
)
GO

The chromosome position for rs2823639 from b142_SNPChrPosOnRef_106.bcp file is

2823639    21    16204244    0

The reason for the -1 difference in chromosome position in .bcp file (compared to the dbSNP website) is explained here

The FTP files I linked are for the GRCh38. You can get the corresponding files for GRCh37.p13 here

ADD REPLY
0
Entering edit mode

Thank you so much Siva! I downloaded the new files for GRCh37 and will try to match my rsnumber and physical position to them. I have another question though, b142_SNPChrPosOnRef_105.bcp and SNPAncestralAllele.bcp have different number of rows. Shouldn't they be the same?

ADD REPLY
2
Entering edit mode

You are welcome. The b142_SNPChrPosOnRef_105.bcp file has unique rows (chromosome position) for each snp_id whereas there can be more than more row (multiple submissions/batch_ids) for the same snp_id in SNPAncestralAllele.bcp file. In the example you posted in your original post, there are 6 batch_ids for 1 snp_id.

ADD REPLY
0
Entering edit mode

Thank you so much! This solved all my questions!

ADD REPLY
0
Entering edit mode

Hi Siva, I just encountered another problem. For several rsSNPs I found that different batches point to different ancestral alleles. Will batch number should I trust? The latest one? I searched for information on batches on the dbSNP website but couldn't find anything.

ADD REPLY
1
Entering edit mode
10.0 years ago
Jie Ping ▴ 40

You can find 1kg ancestral alleles (actually derived from Ensembl) here.

ADD COMMENT
0
Entering edit mode

I am going to try that after checking how many SNPs with ancestral information are there on the dbSNP dataset. Thanks.

ADD REPLY
0
Entering edit mode

I went to the link for the 1kg ancestral alleles for GRCh37 but only found the question and answer page. Can you give more details about how to find the file?...I think I found it. I see the SNPAncestralAllele.bcp.gz found here https://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/database/organism_data/.

ADD REPLY

Login before adding your answer.

Traffic: 1658 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6