Hi
I don't know if this is an appropriate question for this forum. If not, I apologise in advance and won't ask questions of this nature again. If it is appropriate I look forward to an interesting answer.
Does anyone have any interesting ideas for how you might model in a database the conservation of genome positions in a genome. The obvious first thought is a table called Locus with one row per base in the genome and fields called Position, Chromosome and conservationScore.
That's a large database table for the human genome but can be indexed and should be a breeze for any enterprise db like mysql. However I was interested to learn any other perspectives and approaches. Sometimes when the solution is so obvious you don't think laterally enough.
Using a database implies that you want to do relational queries - is that your aim? Can you give an example of the biological question you want to answer? Are you expecting data for every base? What is the thing each base is compared with to get conservationScore?
at present the conservation scores will just be inserted into the database and then used by the biologist to assess the importance of a snp at this position. my other data is in a relational database but I'm not averse to have obtaining a locus for a snp fro a database and then using that as a hook to get conservation scores from another format if that format is smaller and more efficient.
I haven't looked into the availability of the data thoroughly but I will use the human genome to start with I think. I was thinking of phylop,phastcon and gerp scores but I don't know enough about how these are calculate to know if I should expect correlation. It appears not from this post: Correlation Between Genome Conservation Scores (Phastcons Vs. Phylop)?