I've been looking for examples of how to store restriction sites in GFF3 format, but I have been unable to find any. Assuming I use the correct SO term in the third ('type') column, it shouldn't be too hard. My questions are mostly details.
- Should the start and stop positions correspond to the recognized palindromic sequence, or should they correspond to the exact cleavage site?
- When a sequence is digested by a restriction enzyme, do the lengths of the resulting fragments include the sticky, overhanging, single-stranded DNA, or do the lengths only extend as far as the DNA is double-stranded?
Thanks!
Just to be clear, when you say "store restriction sites" - you mean that you want to store a feature corresponding to the recognition sequence? So for example, you want to know if EcoRI (GAATTC) should have start = 1 and end = 6 ?
Yeah, I guess that's not very clear from my question. I guess my question deals both with the recognition site and the resulting restriction fragments (which may or may not be stored as features in the same file). Basically, how do I store recognition sites and how is that interpreted in relation to the restriction fragments? If I ignore the sticky ends of the fragments, the combined lengths of the fragments will be less than the original sequence length, but if I include them then the combined length will be greater. Does this make sense?
Yes, that helps. This is one of those questions that seems simple, until you think about it :-)