Is it better to use a fasta file that doesn't have any N's? your help would be much appreciated, thank you.
Is it better to use a fasta file that doesn't have any N's? your help would be much appreciated, thank you.
An N
usually means that the base identity could not be established could be any of ATGC
Not all tools can handle sequences with N
s in them. I would say that the majority cannot.
The problem is of course, what should one do with the N
s if the methodology cannot handle them. Simply deleting the N
may not have the desired effect.
Soft masking indicates masked regions by using lower-case letters. Hard masking indicates masked regions with a N for nucleotides or X for proteins. Human genome has (at least) two-third repetitive elements These repetitive elements are soft-masked by converting the upper case letters to lower case or Hard masked by N's. annotation tools prefer to use soft masked genomes, as they primarily search for genes in non repeated regions, but tolerate that some genes overlap partially with these regions. in my opinion, use what ever possible to you, do you validation (Benchmarking) and if the result is good for you then that one you have chose is the best one.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Can you add some context to your question?
With what kind of fasta file are you dealing? Genomic data? transcriptome data? ...