Question

What does it mean when a FASTA file has many "NNN's"?

0

Entering edit mode

2.9 years ago

aimish • 0

Is it better to use a fasta file that doesn't have any N's? your help would be much appreciated, thank you.

fasta phylogenetics mega • 1.9k views

ADD COMMENT • link updated 2.9 years ago by a.alnawfal.1992 ▴ 360 • written 2.9 years ago by aimish • 0

0

Entering edit mode

Can you add some context to your question?

With what kind of fasta file are you dealing? Genomic data? transcriptome data? ...

ADD REPLY • link 2.9 years ago by lieven.sterck 15k

score 2 · Answer 1 · 2022-02-01

An N usually means that the base identity could not be established could be any of ATGC

Not all tools can handle sequences with Ns in them. I would say that the majority cannot.

The problem is of course, what should one do with the Ns if the methodology cannot handle them. Simply deleting the N may not have the desired effect.

score 1 · Answer 2 · 2022-02-02

Soft masking indicates masked regions by using lower-case letters. Hard masking indicates masked regions with a N for nucleotides or X for proteins. Human genome has (at least) two-third repetitive elements These repetitive elements are soft-masked by converting the upper case letters to lower case or Hard masked by N's. annotation tools prefer to use soft masked genomes, as they primarily search for genes in non repeated regions, but tolerate that some genes overlap partially with these regions. in my opinion, use what ever possible to you, do you validation (Benchmarking) and if the result is good for you then that one you have chose is the best one.