Question

How to create a fasta file from a list of sequences

0

Entering edit mode

20 months ago

Alex S ▴ 20

I have a txt file with more than a thousand DNA sequences as follows:

seq-name1 DNA-sequence1
seq-name2 DNA-sequence2
seq-name3 DNA-sequence3

Does anyone know a code to transform this file into a fasta file?

>seq-name1
DNA-sequence1
>seq-name2
DNA-sequence2
>seq-name3
DNA-sequence3

fasta sequences DNA • 1.2k views

ADD COMMENT • link updated 20 months ago by size_t ▴ 120 • written 20 months ago by Alex S ▴ 20

score 2 · Answer 1 · 2023-03-16

2

Entering edit mode

20 months ago

Mensur Dlakic ★ 28k

This command prints > followed by the contents of the first column, then a new line character (\n) followed by second column. It is a fairly trivial operation and should be easy to find many similar solutions by Googling this site or the whole internet.

awk '{print ">"$1"\n"$2}' input.txt > output.fas

ADD COMMENT • link 20 months ago by Mensur Dlakic ★ 28k

score 0 · Answer 2 · 2023-03-16

0

Entering edit mode

20 months ago

tothepoint ▴ 940

You can try sed 's/$seq-name[0-9]$\s$DNA-sequence[0-9]$/>\1\n\2/g' input_file > output_file

ADD COMMENT • link 20 months ago by tothepoint ▴ 940

score 0 · Answer 3 · 2023-03-16

0

Entering edit mode

20 months ago

size_t ▴ 120

perl: perl -ae 'print ">$F[0]\n$F[1]\n";' in >out.fa

ADD COMMENT • link 20 months ago by size_t ▴ 120