What does it mean when a FASTA file has many "NNN's"?
2
0
Entering edit mode
2.9 years ago
aimish • 0

Is it better to use a fasta file that doesn't have any N's? your help would be much appreciated, thank you.

fasta phylogenetics mega • 1.9k views
ADD COMMENT
0
Entering edit mode

Can you add some context to your question?

With what kind of fasta file are you dealing? Genomic data? transcriptome data? ...

ADD REPLY
2
Entering edit mode
2.9 years ago

An N usually means that the base identity could not be established could be any of ATGC

Not all tools can handle sequences with Ns in them. I would say that the majority cannot.

The problem is of course, what should one do with the Ns if the methodology cannot handle them. Simply deleting the N may not have the desired effect.

ADD COMMENT
1
Entering edit mode
2.9 years ago

Soft masking indicates masked regions by using lower-case letters. Hard masking indicates masked regions with a N for nucleotides or X for proteins. Human genome has (at least) two-third repetitive elements These repetitive elements are soft-masked by converting the upper case letters to lower case or Hard masked by N's. annotation tools prefer to use soft masked genomes, as they primarily search for genes in non repeated regions, but tolerate that some genes overlap partially with these regions. in my opinion, use what ever possible to you, do you validation (Benchmarking) and if the result is good for you then that one you have chose is the best one.

ADD COMMENT

Login before adding your answer.

Traffic: 1778 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6