Hi,
I have a couple of questions about the chr1.fa FASTA file at the link below:
Q1) Why does the beginning of the file have a whole bunch of N characters? The IUPAC code for DNA sequences says that N means any nucleotide base, so does this mean that the sequencer equipment could not correctly pull the 1-letter code for Chromosome 1's beginning? Also, starting line 3550 or line 76,907 there are like a hundred more lines of Ns.
Q2) Why are parts of the DNA in lower case, while other parts are in upper case?
Link to the Chromosome 1 file: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/?C=S;O=A
I detect messing effort in reading the documentation found on that very same site.
I have the same issue! How did you resolve it?