I'm just establishing a possible scenario, please feel free to close this question if not realistic. I'm a developer learning molecular biology and fascinated about the world of bioinformatics. Recently I've been asking myself what's the nature of a sequence, but I think it's better if I try to explain with the following "example": If you made a query to a not curated sequence database, and you get a sequence like this one:
'----------'
do you still consider it a biological sequence even when doesn't include any IUPAC letter?
Ha, this could become really philosophical.... Or it already is considering David's comment.
But let us do the basics first. In bioinformatics we use the word sequence for a series of chained biomolecules, either a sequence of nucleotides (making up dna and rna) or a sequence of aminoacids (making up a protein). In both cases we use letters to indicate individual bases or aminoacids. And since we are sometimes not sure about bases we use other letters to indicate ambiguity, up to an "X" where it can be anything. We normally (somebody will probably come up with an exception here...) do not use dashes to indicate sequence components. We use the dashes in alignments of two sequences, to indicate there is something missing in one of them that is present in the other. A dash indicates a gap. but that is not a physical gap! The sequence really is connected. The gap is only part a comparison. So the enlightened one might say something like "a dash is something that is absent in my mind, but nothing is missing in reality".
So to answer your question. No, I would not consider a series of dashes a sequence. But if what you want to do is parsing sequence comparisons I would certainly take them into account.
Thanks Chris, so it doesn't make any sense to have a parser for only the sequence letters with no ambiguity X or dashes. Have you (or anyone here) ever need to identify regions with NO ambiguity?
"Grasshopper, when you can snatch this sequence from my palm you will be enlightened."