How to reduce the length of Ns in a fasta sequence
2
0
Entering edit mode
4.4 years ago
hemu.csb • 0

Hello all, I have a multi fasta file in which many sequences contain long runs of Ns. I want to reduce the length of Ns to a maximum of 150 any suggestions? Thank you,

The example sequence below contains a run of 215 Ns.

Seq1 TAGCAGCAGCAATAGCACTAGCAGTACGAGTAGCGGTAGCAGCAggtagtagcagtagcagcagtagtagaagtagtagtagtagtagtagtagtagtactagt agtactagtagtagtagtagcagcagcagcagcagcagcagtagcagcagccgCAGGGGGAGACAAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNTAGTAGAATGGGTGGTATaagtagcagcagcagcagcagcagcagtagcagcagcagcagctgTAGCAATAACAGCAGCA GCACCAGCAGTAGctgtagcagcagcagtagcagcagcaagAGTAGCAGGAGGAGTAGGAGGAGTAGGAGNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNGGCAacagtagtagcagcagcagcagtagcagcaggagtagcagtagcagtagtagtagc

sequence • 1.1k views
ADD COMMENT
3
Entering edit mode
4.4 years ago
microfuge ★ 1.9k

Seqkit has quite good fasta manupulation functions especially seqkit replace in this case https://bioinf.shenwei.me/seqkit/usage/#replace

it takes search patterns with -p argument which in this case is N repeated 150 or more times N{150,} and -r as replacement which can be N typed 150 times or -

seqkit replace -p '(N{150,})' -r $(printf "%0.sN" {1..150}) -s -i yourFastaFile.fa

ADD COMMENT
2
Entering edit mode
4.4 years ago

linearize and sed

awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}' input.fa | \
tr "\t" "\n" |\
sed -r '/^[^>]/s/N{150,}/NN/g'
ADD COMMENT
0
Entering edit mode

Thank you very much for the help. it worked.

ADD REPLY
1
Entering edit mode

validate+close the answer by clicking on the green mark on the left please.

ADD REPLY

Login before adding your answer.

Traffic: 2017 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6