Question

fasta file to tab delimited file

3

Entering edit mode

8.3 years ago

nakanomasayuki265 ▴ 100

I want to change the format of the fasta file.

>Name
AAAAAAAAAAAAAAAAAAAAAAAAA
>Fasta
BBBBBBBBBBBBBBBBBBBBBBBBBB
·
·
·

Fasta files are in a state with no line breaks except for> lines.

I would like to do this as tab delimited.

#Name AAAAAAAAAAAAAAAAAAAAAAAAA
#Fasta BBBBBBBBBBBBBBBBBBBBBBBBB
#·
#·
#·

What kind of commands and scripts are there? Could you please tell me?

sequence • 17k views

ADD COMMENT • link updated 6.8 years ago by scchess ▴ 640 • written 8.3 years ago by nakanomasayuki265 ▴ 100

0

Entering edit mode

This sounds like an XY problem. Can you explain what you are trying to accomplish?

ADD REPLY • link 8.3 years ago by Brian Bushnell 20k

5

Entering edit mode

6.8 years ago

scchess ▴ 640

Please use the seqkit tool. The accepted solution wouldn't work for multiple lines, so it should be ignored.

seqkit fx2tab myFASTA >  myTAB

ADD COMMENT • link updated 6.8 years ago by finswimmer 16k • written 6.8 years ago by scchess ▴ 640

0

Entering edit mode

will not work for multiple lines in FASTQ

FASTQ has only one sequence line (of significance at least)
OP asked FASTA to TSV, not FASTQ to TSV

ADD REPLY • link 6.8 years ago by Ram 45k

0

Entering edit mode

My sample command did indeed converted FASTA to TSV.

ADD REPLY • link 6.8 years ago by scchess ▴ 640

0

Entering edit mode

Yes, but the accepted answer does work on multiple lines, unless I'm missing something. RS=> should take care of not separating records by \n.

ADD REPLY • link 6.8 years ago by Ram 45k

0

Entering edit mode

The accepted answer had "tail -n+2 ", it wouldn't work for multiple lines.

ADD REPLY • link 6.8 years ago by scchess ▴ 640

0

Entering edit mode

How so? Can you explain please?

ADD REPLY • link 6.8 years ago by Ram 45k

0

Entering edit mode

$ cat test.fa 
>Name
AAAAAAAAAAA
AAAAAA

>Fasta
BBBBBBBBBBBBBB
BBBBB
B
BBBBBB

$ awk 'BEGIN{RS=">"}{print "#"$1"\t"$2;}' test.fa | tail -n+2
#Name   AAAAAAAAAAA
#Fasta  BBBBBBBBBBBBBB

$ seqkit fx2tab test.fa
Name    AAAAAAAAAAAAAAAAA   
Fasta   BBBBBBBBBBBBBBBBBBBBBBBBBB

or a simple case:

$ awk 'BEGIN{RS=">"}{print "#"$1"\t"$2;}' test.fa | tail -n+2 
#Name   AAAAA
#Fasta  B

$ cat test.fa 
>Name
AAAAA A

>Fasta
B
BBBBBB

ADD REPLY • link 6.8 years ago by cpad0112 21k

0

Entering edit mode

This should work for multiline fasta:

$ awk -v RS=">" -v ORS="\n" -v OFS="" '{$1="#"$1"\t"}1' test.fa|tail -n+2
#Name   AAAAAAAAAAAAAAAAA
#Fasta  BBBBBBBBBBBBBBBBBBBBBBBBBB

$ cat test.fa   
>Name
AAAAAAAAAAA
AAAAAA

>Fasta
BBBBBBBBBBBBBB
BBBBB
B
BBBBBB

ADD REPLY • link 6.8 years ago by finswimmer 16k

0

Entering edit mode

Thank you ! This is great !!

ADD REPLY • link 5.5 years ago by lagartija ▴ 160

0

Entering edit mode

@ SmallChess tail -n+2 removes unwanted first line. However as you mentioned, code doesn't work for multi line fasta or fasta with gaps in the sequence

ADD REPLY • link 6.8 years ago by cpad0112 21k

score 5 · Accepted Answer · 2017-02-05

5

Entering edit mode

8.3 years ago

Alex Reynolds 36k

Sure, just use awk:

$ awk 'BEGIN{RS=">"}{print "#"$1"\t"$2;}' in.fa | tail -n+2 > out.txt

ADD COMMENT • link 8.3 years ago by Alex Reynolds 36k

1

Entering edit mode

Alternative: awk 'BEGIN{RS=">";OFS="\t"}NR>1{print "#"$1,$2}' inFile > outFile

ADD REPLY • link 8.3 years ago by 5heikki 11k

0

Entering edit mode

Hey, do you know how to change tab delimited back to fasta format?

ADD REPLY • link 8.0 years ago by yangzituo • 0

0

Entering edit mode

like:

seq1  AAAATTTT
seq2 CCCCGGGG

convert it back to:

>seq1
AAAATTTT
>seq2
CCCCGGGG

Thanks~

ADD REPLY • link 8.0 years ago by yangzituo • 0

2

Entering edit mode

seqkit

seqkit tab2fx xxx.tab > xxx.fasta

ADD REPLY • link 8.0 years ago by shenwei356 8.7k