Snps That Discriminate Mouse Strains In Dbsnp
2
3
Entering edit mode
14.1 years ago

I am designing a genotype panel to distinguish several Mus musculus strains (FVB/N, 129/SV, C57BL/6) from Mus spretus at ~400 loci spread through the mouse genome. Illumina sells a mouse genotype panel, but it is not designed against spretus, and that's the primary requirement. It would be convenient but not required to identify polymorphisms that are already in dbSNP.

Jax Informatics has a nice query interface apparently back-ended by dbSNP, but I can't see any way to ask it to show me only loci where there is a known genotype for all four strains, and where spretus is distinct from the other three strains. I know the information I want must be in dbSNP. I don't think I can download the polymorphisms from Jax, presumably because they get them from dbSNP. I could pull down a lot of queries from Jax and grind over it, but that's not very elegant. Any ideas on how to solve this in code?

EDITED TO SHOW AN EXAMPLE:

rs4222137, at chr1, 4,678,222 is G for Mus spretus and A for B6, 129, and FVB (reverse strand). Output ideally would be the rs ID, the location, and the genotypes:

SNP       CHR  LOCATION B6 129 FVB SPR
rs4222137 chr1 4678222   A   A   A   G

The XML files that Peter linked may solve it. They have the form:

 <SnpInfo rsId="3023491" observed="C/T">
     <SsInfo ssId="4319850" locSnpId="X86368_367C_4" ssOrientToRs="fwd">
        <ByPop popId="1064" hwProb="0.001" hwChi2="15" hwDf="1" sampleSize="30">
            <GTypeByInd gtype="T/T" indId="2920"/>
            <GTypeByInd gtype="C/C" indId="4464"/>
            ...

Where I think 2920 is C3H/HEJ and 4464 is CAST/EI (these are mouse strains). Time to break out the SAX parser...

genotyping mouse dbsnp snp • 4.2k views
ADD COMMENT
0
Entering edit mode

In this case, Don't use SAX but Stax. Please, it is still not clear to me: the 3 strains for musculus are FVB/N, 129/SV, C57BL/6 but what is the strain for spretus ?

ADD REPLY
0
Entering edit mode

Don't use SAX here, but StAx (Streaming API for XML)

ADD REPLY
3
Entering edit mode
14.1 years ago

All you need seems to be in the XML files from ftp://ftp.ncbi.nih.gov/snp/organisms/mouse_10090/genotype_by_gene or/and ftp://ftp.ncbi.nih.gov/snp/organisms/mouse_10090/genotype.

I'm not used to play with the murine SNPs (what are the strains for Mus spretus/musculus ?) , what kind of output do you need ? can you show us an example with a specific rs##?

EDITED Here is an answer as java. As it is not clear to me what is are the strains to be compared, what is the reference assembly, I'm just printing all the genotypes/positions for each rs.

First of all some sources must be generated from the XSD schema using ${JAVA_HOME}/bin/xjc:

xjc -d . ftp://ftp.ncbi.nlm.nih.gov/snp/specs/genoex_1_5.xsd

then, the following program loops over each chromosome in dbSNP , parse each 'Individual' (put it in an array=header) and print the positions & the genotypes for each "SnpInfo".

result:

rs##    Position    000461  000486  000645  000646  000648  000651  000659  000664  000671  000684  000691  000928  001026  001058  001146  002448  001976  000689  000656  AKR CAST/EI 129S6/SvEvTac   CZECHII/EI  129X1/SV    A   A/HE    B10.D2  BALB/C  BALB/CBY    C3H/HE  C57BL/6 DBA/2   MRL/MP  NZB/BLN NZW/LAC 129/SV  000653  000665  000662  000667  000668  000669  000657  000670  001800  000674  002106  000675  000676  000677  002423  000680  000683  000644  000686  000687  000688  001145  001392  000930  000550  001144  002282  DDK/PAS JF1/MS  MAI/PAS MSM/MS  PWD/PH  SEG/PAS BUB FVB LGJ LPJ MOLF    MSM PERA    SMJ PWD/PHJ
rs2020458   chrom1:174992577(37:C57BL/6J);chrom1:175911875(37:Celera);  #NA #NA #NA C/C C/C #NA C/C C/C C/C #NA #NA T/T C/C C/C #NA C/C C/C C/C C/C #NA T/T C/C #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA C/C #NA C/C #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA C/C #NA #NA C/C #NA C/C #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA C/C
rs2020459   chrom1:174992659(37:C57BL/6J);chrom1:175911957(37:Celera);  #NA #NA #NA G/G G/G #NA G/G T/T G/G #NA G/G #NA G/G #NA #NA G/G #NA G/G G/G #NA G/G G/G #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA
rs2020460   chrom1:175911965(37:Celera);chrom1:174992667(37:C57BL/6J);  #NA #NA #NA G/G G/G #NA G/G G/G G/G #NA #NA #NA G/G #NA #NA #NA #NA G/G G/G #NA T/T G/G #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA
rs2020461   chrom1:174992687(37:C57BL/6J);chrom1:175911985(37:Celera);  #NA #NA #NA C/C C/C #NA C/C C/C C/C #NA #NA T/T C/C C/C #NA C/C C/C C/C C/C #NA T/T C/C #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA C/C #NA C/C #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA C/C #NA #NA C/C #NA C/C #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA C/C
rs2020464   chrom1:72510826(37:C57BL/6J);chromUn:null(37:Celera);   #NA #NA #NA C/C C/C #NA C/C C/C C/C #NA #NA #NA C/C #NA #NA #NA #NA C/C #NA #NA T/T C/C #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA
rs2020465   chromUn:null(37:Celera);chrom1:72510799(37:C57BL/6J);   #NA #NA #NA A/A A/A #NA A/A A/A A/A #NA #NA #NA A/A #NA #NA #NA #NA A/A #NA #NA C/C A/A #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA
rs2020466   chrom1:72510751(37:C57BL/6J);chromUn:null(37:Celera);   #NA #NA #NA C/C C/C #NA C/C C/C C/C #NA #NA #NA C/C #NA #NA #NA #NA C/C #NA #NA T/T C/C #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA
rs2020467   chrom1:72510696(37:C57BL/6J);chromUn:null(37:Celera);   #NA #NA #NA A/A A/A #NA A/A A/A A/A #NA #NA #NA A/A #NA #NA #NA #NA A/A #NA #NA G/G A/A #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA #NA
 (...)
ADD COMMENT
0
Entering edit mode

Thanks for that pointer. Mus musculus is the domestic house mouse; the primary mouse sequence is one strain of musculus called C57BL/6. There are dozens of mouse strains in use. Most are, broadly speaking, very similar genetically, having been derived from a few common ancestors in the last ~100 years. Mus spretus is an inbred strain derived from a wild Spanish mouse, separated from C57BL/6 by 1-2 million years of evolution.

ADD REPLY
0
Entering edit mode

Mus musculus strains are (FVB/N, 129/SV, C57BL/6) but what is/are the strain(s) for Spretus (in the XML file?)

ADD REPLY
0
Entering edit mode

Spretus is indID=4461. Found it with 'more gt_chr19.xml', and then searched with /SPRET. I think each strain is counted as an "Individual" in the schema.

ADD REPLY
0
Entering edit mode

hum, I printed too many names, I should have keep only one name per Individual, but you get the idea...

ADD REPLY
3
Entering edit mode
14.1 years ago

Thanks to Pierre for showing me the right file. This quick and dirty Python code assumes you have the XML files locally. Output looks like:

rs      chrom   start   orient  observed        SPRETUS FVB     129     B6
rs16818137      19      6984395 fwd     C/T     C/C     T/T     T/T     T/T
rs4232043       19      8815081 fwd     C/T     C/C     T/T     T/T     T/T
rs13483521      19      9028409 fwd     G/T     G/G     T/T     T/T     T/T
rs13483524      19      9972250 fwd     C/T     C/C     T/T     T/T     T/T


import sys
from xml.sax import make_parser, handler
idB6 = "2921"; id129 = "2924"; idFVB = "4560" ; idSPRETUS = "2928"
B6_GENOME = "37:C57BL/6J"

class dbParser(handler.ContentHandler):
    def __init__(self, fn_out):
        self.fn_out = fn_out
        self.ids = set([idB6, id129, idFVB, idSPRETUS])
        self.genotypes = {}
        self.rs, self.obs, self.chrom, self.start, self.orient  = "", "", "", "", ""
        self.snps = []
        self.current_is_valid = False
    def startElement(self, name, attrs):
        if name=="SnpInfo":
            self.rs = attrs.get('rsId',"")
            self.obs = attrs.get('observed',"")
            self.current_is_valid=False
        elif name=="SnpLoc":
            if attrs["genomicAssembly"]==B6_GENOME:
                self.chrom = attrs["chrom"]
                self.start = attrs["start"]
                self.orient = attrs["rsOrientToChrom"]
                self.current_is_valid=True
        elif name=="GTypeByInd":
            if self.current_is_valid:
                indID = str(attrs["indId"])
                if indID in self.ids:
                    self.genotypes[ indID ] = attrs["gtype"]
    def endElement(self, name):
        if name=="SnpInfo":
            if len(self.genotypes)==4:
                s = self.genotypes[idSPRETUS]
                f = self.genotypes[idFVB]
                o = self.genotypes[id129]
                b = self.genotypes[idB6]
                if f==o and o==b and s != f:
                    self.snps.append([int(self.start), "rs" + self.rs, self.chrom, self.start, self.orient, self.obs, s, f, o, b])
            self.current_is_valid=False
            self.genotypes = {}
    def endDocument(self):
        fo = open(self.fn_out, 'w')
        fo.write( "rs\tchrom\tstart\torient\tobserved\tSPRETUS\tFVB\t129\tB6\n")
        for snp in sorted(self.snps):
            fo.write( '\t'.join(snp[1:]) + '\n' )
        fo.close()
fn_in = "gt_chr19.xml"
parser = make_parser()
parser.setContentHandler(dbParser(fn_in + ".parsed.txt"))
parser.parse(fn_in)
ADD COMMENT

Login before adding your answer.

Traffic: 2699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6