Top/Bottom Confusion For Illumina Snp Calls
1
3
Entering edit mode
14.2 years ago
Perry ▴ 290

Here is the problem: Illumina calls their SNPs AA,AB,BB. The meaning of A and B depend on what they call "top" or "bottom" strand. One of the problems that I am facing is that I don't have the original data. All I have is the Illumina SNP processed file with the SNP number and genotype call (AA, AB, BB). THESE CALLS SHOULD BE UNIQUELY translatable into nucleotides.

1) let's assume for a moment that the SNP calls are from a ILMN_Human_1M chip

2) let's say for rs13536 I have a call of BB

3) what nucleotides does this correspond to on the positive strand of the reference genome?

According to Illumina:

Top Strand, Bottom Strand

1: A-G , T-C

2: A-C , T-G

So if I go to dbSNP for rs13536, and I see T/C, I'm dealing with the bottom strand, and I can use this to get the nucleotides.

I see that I can solve my problem by determining if the call is top or bottom, by following these instructions:

1 You can compute the top/bottom designation yourself using the data in the /organisms/human_9606/GWAS_arrays/ directory on the dbSNP FTP site.

2 You can look at dbSNP's top/bottom assignment, which you can access if you download the SubSNP.bcp file located in the/database/organism_data/ directory for human. The field that includes the top/bottom data is called SubSNP.top_or_bot_strand. You can access the table DDL for SubSNP in the /database/organism_schema directory.

I do both to make sure my answers are consistent. I grab:

1) ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/GWAS_arrays/ILLUMINA.ILLUMINA_Human_1M.xml.gz

2) ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/database/organism_data/SubSNP_top_or_bot.bcp.gz

In ILLUMINA.ILLUMINA_Human_1M.xml, rs13536 is top: <ss batchid="33668" buildid="127" handle="ILLUMINA" linkouturl="&lt;a href='&lt;a href=" http:="" www.illumina.com="" products="" arraysreagents="" wgghuman1.ilmnHuman1-rs13'&gt;http:="" www.illumina.com="" products="" arraysreagents="" wgghuman1.ilmnHuman1-rs13&lt;="" a&gt;&lt;="" p&gt;"="" rel="nofollow">http://www.illumina.com/products/arraysreagents/wgghuman1.ilmnHuman1-rs13'>http://www.illumina.com/products/arraysreagents/wgghuman1.ilmnHuman1-rs13

536" locsnpid="Human1-rs13536" methodclass="other" moltype="genomic" orient="forward" ssid="65715089" strand="top" subsnpclass="snp" validated="by-submitter"> <sequence>

            <Seq5>TTTCGAACCGAGACAGATGGCAGCTAAATGAAGTTTAATTAAAGAATGAG</Seq5>

            <Observed>C/T</Observed>

            <Seq3>GCTGGGGCCCTTTTTATTGGGTACTGCATCTACTTCGACCACAAAAGACG</Seq3>

        </Sequence>

But Illimina states that C/T is bottom. Why is it top here?

In SubSNP_top_or_bot.bcp, rs13536 is bottom, which is consistent with C/T:

13536 B 5

Why is there a conflict between the files?

dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=rs13536) shows bottom for both ILLUMINA assays. Why is ILLUMINA.ILLUMINA_Human_1M.xml in conflict with these?

illumina dbsnp • 11k views
ADD COMMENT
6
Entering edit mode
14.2 years ago
Jan Oosting ▴ 920

Illumina has a technote on their naming convention of SNPs: “TOP/BOT” Strand and “A/B” Allele

ADD COMMENT
0
Entering edit mode

Thanks. It seems to me that this naming convention conflicts with what is presented in ILLUMINA.ILLUMINA_Human_1M.xml. In the naming convention file, rs536477 is A/G and TOP. However, the rs536477 entries in the XML file are A/G and strand='bottom'. Does strand have a different meaning in the XML file?

ADD REPLY

Login before adding your answer.

Traffic: 1820 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6