How to replace exactly 100 Ns
1
0
Entering edit mode
2.5 years ago
Anibesa • 0

Hello, I’m trying to find a way to replace a string of exactly 100 Ns in a fasta sequence with some other character, ex Z.

It’s a multi line fasta too long to be converted to a single line, so sed won’t work.

Text editing tools will replace any string of 100 Ns. If the string is 99 or 101 Ns it shouldn’t be changed.

I’m stumped. Any suggestions will be appreciated.

Fasta • 1000 views
ADD COMMENT
0
Entering edit mode

Perfect! Thank you.

ADD REPLY
0
Entering edit mode

Please check the comment below.

ADD REPLY
3
Entering edit mode
2.5 years ago

A seqkit solution.

Here's some example data.

>seq_1
ATGNNNNNNNNNNTCGA
>seq_2
ATGNNGCGAGGCCCTTT
>seq_3
NNNNNNNNNNCCCCTGA
>seq_4
NNNGGTATTGCGTTATG
>seq_5
NNNNNNNNNNNNTTTAA
>seq_6
ATGNNNNNNNNNNNNGG
>seq_7
ATGTCCGNNNNNNNNNN
>seq_8
AAAGTCTAANNNNNNNN

Replacing Ns with Zs only if there are just 10 Ns.

EDIT: Correction courtesy of shenwei356

seqkit replace -s -p "([^N]|^)N{10}([^N]|$)" -r $(echo -e '${1}'$(printf 'Z%.0s' {1..10})'${2}') test.fasta

And the result.

>seq_1
ATGZZZZZZZZZZTCGA
>seq_2
ATGNNGCGAGGCCCTTT
>seq_3
ZZZZZZZZZZCCCCTGA
>seq_4
NNNGGTATTGCGTTATG
>seq_5
NNNNNNNNNNNNTTTAA
>seq_6
ATGNNNNNNNNNNNNGG
>seq_7
ATGTCCGZZZZZZZZZZ
>seq_8
AAAGTCTAANNNNNNNN
ADD COMMENT
2
Entering edit mode

The two flanking non-N bases are missing (G and T for seq_1).

$ seqkit replace -s -p "([^N]|^)N{10}([^N]|$)" -r $(printf 'Z%.0s' {1..10}) test.fasta
>seq_1
ATZZZZZZZZZZCGA

$ seqkit replace -s -p "([^N]|^)N{10}([^N]|$)" -r $(echo -e '${1}'$(printf 'Z%.0s' {1..10})'${2}') test.fasta 
>seq_1
ATGZZZZZZZZZZTCGA
ADD REPLY
0
Entering edit mode

Thank you for the correction to my (obvious) mistake! I've edited the post with the correct answer.

ADD REPLY

Login before adding your answer.

Traffic: 2074 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6