Question

Extract first and last column of fasta-header

0

Entering edit mode

2.8 years ago

genomes_and_MGEs ▴ 10

Hi everyone,

I have a multi-fasta file name multi.fasta with the following structure:

>A 124 B
ATCGTA...
>C 567 D
GTCAG...

My goal is to create a new file, with the new fasta-headers containing only the first and last column. If I use

awk -F" " '/>/ {print $1,$(NF)}' multi-fasta > modified_multi-fasta

This will print the fasta headers with the first and last columns, but won't print the nucleotide sequences. Can you guys please help me out?

sequence • 1.5k views

ADD COMMENT • link updated 2.8 years ago by Hugo ▴ 380 • written 2.8 years ago by genomes_and_MGEs ▴ 10

score 2 · Answer 1 · 2022-06-27

2

Entering edit mode

2.8 years ago

Pierre Lindenbaum 166k

 awk '/^>/ {print $1,$(NF);next;} {print;}'

ADD COMMENT • link 2.8 years ago by Pierre Lindenbaum 166k

score 1 · Answer 2 · 2022-06-27

1

Entering edit mode

2.8 years ago

cpad0112 21k

$ awk '/^>/{$2 =""}1' test.fa

>A  B
ATCGTA...
>C  D
GTCAG...

$ awk '/^>/{print $1,$NF}!/>/' test.fa
$ cut -f1,3 -d" " test.fa
$ sed -r '/^>/ s/\s.*\s/ /1' test.fa
$ tr -d 0-9 <test.fa | tr -s " "

>A B
ATCGTA...
>C D
GTCAG...

ADD COMMENT • link 2.8 years ago by cpad0112 21k

score 0 · Answer 3 · 2022-06-28

0

Entering edit mode

2.8 years ago

Hugo ▴ 380

You can use the "Rename header / Multipart header" operation of SEDA (https://www.sing-group.org/seda/manual/operations.html#multipart-header), it is very useful for this kind of FASTA headers.

ADD COMMENT • link 2.8 years ago by Hugo ▴ 380