Can anyone recommend a tool or unix command line to remove terminal (leading or trailing) Ns from a fasta file?
Thanks in advance for any advice.
Can anyone recommend a tool or unix command line to remove terminal (leading or trailing) Ns from a fasta file?
Thanks in advance for any advice.
see if this works with seqkit to remove terminal Ns:
$ seqkit -is replace -p "n+$" -r "" test.fa
To remove leading Ns as well (as mentioned in the OP), try following:
$ seqkit -is replace -p "^n+|n+$" -r "" test.fa
Try this with sed to remove leading and trailing n:
$ sed -r '/^>/! s/n+$|^n+//g' test.fa
Your sed command runs the risk of removing internal Ns that occur before/after line breaks in the sequence. To avoid that you need to linearize the sequence first. It also creates an undesirable empty line between the definition line and the sequence.
This revision avoids all that. It linearizes the record (awk) then converts it to a 2-line fasta record (tr) then removes the leading and trailing N (sed -- I use N instead of n, because typically my Ns are capitalized; you can handle both by substituting [Nn]), then it reformats (fold) the long sequence line into lines of 80 nt width. Make sure you use a width longer than the length of your longest definition line, otherwise it will break those deflines at 80 characters too.
awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}' my.fasta | tr "\t" "\n" | sed -r '/^>/! s/N+$|^N+//g' | fold -w 80 > my.Ntrimmed.fasta
Use seqkit subseq:
example:
seqkit subseq -r 6:-1 test.fa # remove the first 5 bases
Another option by using Perl oneliner:
perl -pe 's/^([ACGT][ACGTN]+?)N+$|(^N+)/$1/gi' test.fa
Apparently this command does work only if Ns are at the start OR at the end of a line. It does not work if Ns are from both sides.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you everyone.
This request was to help with submission of genome assemblies to Genbank. They ask that the terminal Ns (gaps) be removed from the ends of contigs. However, I have found that if you simply leave them on they will remove them as part of their process.
stacy734 : While that may be the case since you asked this question in the first place can you please test the posted answers and accept any/all those that work. This would benefit future users who will find this thread by searching.
In fact, this no longer works, so you need one of the other solutions.