dbSNP SNPHistory file: to what correspond each column?
1
0
Entering edit mode
9.2 years ago

Hello,

I am looking for information about the SNPHistory.bcp.gz file provided by dbSNP (NCBI).

I found a short (too short?) documentation about the table SNPHistory: http://www.ncbi.nlm.nih.gov/projects/SNP/snp_db_table_description.cgi?t=SNPHistory

But the problem is that there is a lack of information.

In the file I downloaded, if I print the 10 first lines, I can see 5 columns:

$ zhead SNPHistory.bcp.gz
311    2000-09-19 17:02:00.0    2012-11-29 09:31:00.0    2013-11-18 14:52:00.0    SNP-6191
332    2000-09-19 17:02:00.0    2011-01-11 17:12:00.0    2011-05-20 17:31:00.0
471    2000-09-19 17:02:00.0    2012-11-29 09:31:00.0    2013-11-18 14:52:00.0    SNP-6191
668    2000-09-19 17:02:00.0    2014-08-21 18:14:00.0    2014-08-26 00:20:00.0    SNP-6860
730    2000-09-19 17:02:00.0    2012-11-29 09:31:00.0    2013-11-18 14:52:00.0    SNP-6191
743    2000-09-19 17:02:00.0    2012-11-29 09:31:00.0    2013-11-18 14:52:00.0    SNP-6191
744    2000-09-19 17:02:00.0    2012-11-29 09:31:00.0    2013-11-18 14:52:00.0    SNP-6191
745    2000-09-19 17:02:00.0    2012-11-29 09:31:00.0    2013-11-18 14:52:00.0    SNP-6191
799    2000-09-19 17:02:00.0    2012-11-29 09:31:00.0    2013-11-18 14:52:00.0    SNP-6191
840        2000-08-22 15:29:00.0    2000-09-19 14:28:00.0

But if I count the number of field, I count 6 columns:

$ zcat SNPHistory.bcp.gz | awk -F"\t" '{print NF}' | sort | uniq -c
17390806 6

And if I reprint the 10 first lines with all characters:

$ zhead SNPHistory.bcp.gz | cat -A
311^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
332^I2000-09-19 17:02:00.0^I2011-01-11 17:12:00.0^I2011-05-20 17:31:00.0^I^I$
471^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
668^I2000-09-19 17:02:00.0^I2014-08-21 18:14:00.0^I2014-08-26 00:20:00.0^ISNP-6860^I$
730^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
743^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
744^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
745^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
799^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
840^I^I2000-08-22 15:29:00.0^I2000-09-19 14:28:00.0^I^I$

So I wonder, what are those 2 last columns? Anyone could help me?

I need to use this file to know which SNPs are suppressed but I don't know how to interpret those columns...

Thanks in advance

SNP • 2.1k views
ADD COMMENT
0
Entering edit mode
9.2 years ago
Max Ivon ▴ 140

According to database schema (available at dbSNP ftp site):

CREATE TABLE [SNPHistory]
(
[snp_id] [int] NOT NULL ,
[create_time] [smalldatetime] NULL ,
[last_updated_time] [smalldatetime] NOT NULL ,
[history_create_time] [smalldatetime] NULL ,
[comment] [varchar](255) NULL ,
[reactivated_time] [smalldatetime] NULL
)

I think you can find more information about this file from this faq http://www.ncbi.nlm.nih.gov/books/NBK44468/. As there said, SNPHistory contain only deleted mutations, so if you want to get suppressed mutations, just take lines where reactivated time is not defined.

ADD COMMENT

Login before adding your answer.

Traffic: 1994 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6