HI, I want to add 2 nucleotides in the beginning of each line in a FASTA file.
>
GCATAGGC
the desired output
>
TAGCATAGGC
can someone help.
HI, I want to add 2 nucleotides in the beginning of each line in a FASTA file.
>
GCATAGGC
the desired output
>
TAGCATAGGC
can someone help.
seqkit mutate
can edit FASTA sequence (point mutation, insertion, deletion) . Please use v0.14.0rc1 or later version which fix a bug for insersion
seqkit mutate -i
supports inserting bases at any position. For example, for two (multi-line) sequences.
$ cat seqs.fa
>seq1
GCATAGGC
>seq2
AAACCC
GGGTTT
1). At the beginning
$ cat seqs.fa | seqkit mutate -i 0:TA
>seq1
TAGCATAGGC
>seq2
TAAAACCCGGGTTT
2). At the end.
$ cat seqs.fa | seqkit mutate -i -1:TA
>seq1
GCATAGGCTA
>seq2
AAACCCGGGTTTTA
3). Behind the 5th base
$ cat seqs.fa | seqkit mutate -i 5:TA
>seq1
GCATATAGGC
>seq2
AAACCTACGGGTTT
If each sequence is one and only one line, and they Capital letters. (This works for both nucleotide and amino acid sequences; you can replace [A-Z] with [ATGC] if you want to be more specific.)
sed '/^[A-Z]/s/^/TA/' fila.fasta > output.fasta
If you also have multi-line sequences, then you can first use this command to convert it to one-liner sequences:
awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}' input.fasta > file.fasta
close. filter the headers ( assuming that sequences are in single line):
$ sed '/^>/! s/^/TA/' test.fa
or, you can also use:
$ sed "0~2 s/^/TA&/" test.fa
with Awk:
$ awk -v OFS="\n" '/^>/ {getline seq; print $0,"TA"seq}' test.fa
$ awk '{print ((NR%2)? "":"TA") $0}' test.fa
When the FASTA file may span multiple lines and when the resulting FASTA should be well-formed (wrapped at the same length) one needs to chain up more commands.
My best bet makes use of both bioawk
and seqkit
(both a installable with bioconda):
cat foo.fa | bioawk -v prefix="TATA" -c fastx '{ printf(">%s\n%s%s",$name, prefix, $seq) }' | seqkit seq
prints
>foo
TATAATGGACTCTCGTCCTCAGAAAGTCTGGATGACGCCGAGTCTCACTGAATCTGACAT
GGATTACCACAAGATCTTGACAGCAGGTCTGTCCGTTCAACAGGGGGTTGTTCGGCAAAG
AGTCATCCCAGTGTATCAAGTAAACAATCTTGAGATCCCAGTGTATCAAGTAAACAATCT
TGAGATCCCAGTGTATCAAGTAAACAATCTTGAGATCCCAGTGTATCAAGTAAACAATCT
TGAGATCCCAGTGTATCAAGTAAACAATCTTGAG
Uses the trick shown in A: Fasta file edition
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What have you tried? This can be done with a
sed
command that matched the first character and replaced the line-beginning anchor withTA
.sed -i 's/^/TA/'
file.fastaThat does not match the first character in each line. You'll end up adding
TA
to the header lines too, and that too before the>
lines, essentially corrupting the FASTA file.Also, don't use
-i
until you're 100% sure the command is exactly what you want.yes, it does add a TA to the header. Then what exactly should be the command.
amitpande74, please accept all answers that solve your question.
A: Fasta file edition
Replace "ACTG" with "TA".