Hello,
I am looking for information about the SNPHistory.bcp.gz file provided by dbSNP (NCBI).
I found a short (too short?) documentation about the table SNPHistory: http://www.ncbi.nlm.nih.gov/projects/SNP/snp_db_table_description.cgi?t=SNPHistory
But the problem is that there is a lack of information.
In the file I downloaded, if I print the 10 first lines, I can see 5 columns:
$ zhead SNPHistory.bcp.gz
311 2000-09-19 17:02:00.0 2012-11-29 09:31:00.0 2013-11-18 14:52:00.0 SNP-6191
332 2000-09-19 17:02:00.0 2011-01-11 17:12:00.0 2011-05-20 17:31:00.0
471 2000-09-19 17:02:00.0 2012-11-29 09:31:00.0 2013-11-18 14:52:00.0 SNP-6191
668 2000-09-19 17:02:00.0 2014-08-21 18:14:00.0 2014-08-26 00:20:00.0 SNP-6860
730 2000-09-19 17:02:00.0 2012-11-29 09:31:00.0 2013-11-18 14:52:00.0 SNP-6191
743 2000-09-19 17:02:00.0 2012-11-29 09:31:00.0 2013-11-18 14:52:00.0 SNP-6191
744 2000-09-19 17:02:00.0 2012-11-29 09:31:00.0 2013-11-18 14:52:00.0 SNP-6191
745 2000-09-19 17:02:00.0 2012-11-29 09:31:00.0 2013-11-18 14:52:00.0 SNP-6191
799 2000-09-19 17:02:00.0 2012-11-29 09:31:00.0 2013-11-18 14:52:00.0 SNP-6191
840 2000-08-22 15:29:00.0 2000-09-19 14:28:00.0
But if I count the number of field, I count 6 columns:
$ zcat SNPHistory.bcp.gz | awk -F"\t" '{print NF}' | sort | uniq -c
17390806 6
And if I reprint the 10 first lines with all characters:
$ zhead SNPHistory.bcp.gz | cat -A
311^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
332^I2000-09-19 17:02:00.0^I2011-01-11 17:12:00.0^I2011-05-20 17:31:00.0^I^I$
471^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
668^I2000-09-19 17:02:00.0^I2014-08-21 18:14:00.0^I2014-08-26 00:20:00.0^ISNP-6860^I$
730^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
743^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
744^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
745^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
799^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
840^I^I2000-08-22 15:29:00.0^I2000-09-19 14:28:00.0^I^I$
So I wonder, what are those 2 last columns? Anyone could help me?
I need to use this file to know which SNPs are suppressed but I don't know how to interpret those columns...
Thanks in advance