Hi.
I have a txt file containing mutiple fasta sequences, and I'd like to replace Nth character in each sequence.
If I want to replace 10th character of each sequence in the example below, which linux command can I use?
> sample1
TATCCGATGCGACGTGCAGCG
> sample2
CTAGCGTAGTGTCGACTGCAT
> sample3
GACTGACGTGACGTAGTCGAC
Thank you!
Thank you, Alex! It works well. I can understand between 'if' and 'else' but can't really figure out what the command behind 'else' mean. Would you explain it to me? Thank you.
The
printf()
command prints a string made up of three format specifiers%s
,%c
and%s
along with a newline character\n
.The first
%s
corresponds to the string value ofsubstr($0, 1, 9)
.The
%c
corresponds to the character value ofX
.The second
%s
corresponds to the string value ofsubstr($0, 11, length($0) - 10)
.The
substr()
function returns the substring of the string passed in, generated from the starting index and length passed in.So
substr($0, 1, 9)
takes the substring of the sequence line$0
from the start index of 1 — the first character — and grabs the first nine characters.The second
substr($0, 11, length($0) - 10)
takes the substring of the sequence line from the starting point of 11 characters into the string, with the length of the sequence minus ten characters.See the
awk
documentation here for more detail on string functions.I really appreciate your detailed explanation. You rock!!!