Why Does The Chr1.Fa Fasta File Have A Bunch Of Ns And Why Is Some Of The Dna In Lower Case Vs. The Rest In Upper Case?
1
1
Entering edit mode
11.7 years ago
sameer ▴ 10

Hi,

I have a couple of questions about the chr1.fa FASTA file at the link below:

Q1) Why does the beginning of the file have a whole bunch of N characters? The IUPAC code for DNA sequences says that N means any nucleotide base, so does this mean that the sequencer equipment could not correctly pull the 1-letter code for Chromosome 1's beginning? Also, starting line 3550 or line 76,907 there are like a hundred more lines of Ns.

Q2) Why are parts of the DNA in lower case, while other parts are in upper case?

Link to the Chromosome 1 file: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/?C=S;O=A

fasta • 6.4k views
ADD COMMENT
1
Entering edit mode

I detect messing effort in reading the documentation found on that very same site.

ADD REPLY
0
Entering edit mode

I have the same issue! How did you resolve it?

ADD REPLY
9
Entering edit mode
11.7 years ago
Bert Overduin ★ 3.7k
  1. The Ns at the end of the chromosomes represent unsequenced heterochromatin.

  2. On the page http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/ you can read: "Repeats from RepeatMasker and Tandem Repeats Finder (with period of 12 or less) are shown in lower case; non-repeating sequence is shown in upper case. RepeatMasker was run with the -s (sensitive) setting. Using: Jan 29 2009 (open-3-2-7) version of RepeatMasker and RELEASE 20090120 of library RepeatMaskerLib.embl". So, the sequence has been what is called "soft-masked", i.e. the repeats are shown in lower case. Another way of masking "hard-masking", in which repeats are shown as Ns.

ADD COMMENT
0
Entering edit mode

unsequenced heterochromatin

Like telomeres or centromeres, right?

ADD REPLY

Login before adding your answer.

Traffic: 1648 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6