I am wondering if there is any special meaning of lower case a,t,g,c in the sequenced genomes.
I am wondering if there is any special meaning of lower case a,t,g,c in the sequenced genomes.
That's called "soft masking". Generally, these were found to be repeat regions with RepeatMasker (or some other tool). There are a couple options when it comes to masking genome. Aside from soft masking, one can "hard mask", meaning replacing a given region with a bunch of N's. This, of course, can produce pretty useless genomes for many use cases. Consequently, you'll often find people keeping soft-masked genomes around (I recall that UCSC provides both, with the regular genomes that you download already being soft masked).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.