Entering edit mode
9.5 years ago
anderspitman
▴
70
I've heard that sometimes sequences are stored in files partially lowercase and partially uppercase, with the intent that lower- or upper-case regions indicate introns or exons.
Is this a common practice? Are there file formats that explicitly support this type of encoding? Google hasn't yielded anything, but I might just be searching for the wrong things.
Can you link to any examples where the convention is explained?
Lower-case as exon-intron: this paper, which refers to this (or this) database; or see this server to generate pretty graphics of gene structure.
Lower case for soft-masking: see USEARCH manual or Blast lcase_masking parameter.