Question

How to remove the header in fasta file and keep only the desirable part on ubuntu?

0

Entering edit mode

3.2 years ago

Jelo • 0

Hi all,

I have a fasta file with this header

>10005_M12.fastq    Otu0001|242290|M1.fastq-M12.fastq-M5.fastq-URTM6.fastq-M7.fastq-M9.fastq

I want to remove all the header parts except the OTU (with its number), I used the this command sed 's/>M.*Otu/>Otu/g' rep.fasta |sed -e 's/|.*//g'> rep.otu.fasta but the command removed only the part after OTU as following;

>10005_M12.fastq    Otu0001

I want the header looks like (>Otu0001)

Any advice will be appreciated

Thank you

microbiome fasta NGS • 1.8k views

ADD COMMENT • link updated 21 months ago by Ram 44k • written 3.2 years ago by Jelo • 0

0

Entering edit mode

Thank you all for help

ADD REPLY • link 3.2 years ago by Jelo • 0

1

Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
upvote_bookmark_accept

ADD REPLY • link 3.2 years ago by GenoMax 147k

1

Entering edit mode

3.2 years ago

cpad0112 21k

if sequences have no |, try this:

$ awk -F "|" '{print $1}' test.fa

if you are not sure, you can use this:

$ awk -F "|" '/^>/ {print $1}; !/^>/' test.fa

or this:

$ awk -F "|" '{print ($0 ~ /^>/)?$1:$0}' test.fa

ADD COMMENT • link 3.2 years ago by cpad0112 21k

score 3 · Accepted Answer · 2021-09-12

3

Entering edit mode

3.2 years ago

Pierre Lindenbaum 164k

 sed '/^>/s/.*[ \t]*\(Otu[0-9]*\).*/>\1/' in.fa

ADD COMMENT • link 3.2 years ago by Pierre Lindenbaum 164k

score 2 · Accepted Answer · 2021-09-12

2

Entering edit mode

3.2 years ago

rpolicastro 13k

seqkit answer also for posterity

seqkit replace -p "\|.*" in.fa

ADD COMMENT • link 3.2 years ago by rpolicastro 13k

1

Entering edit mode

seqkit replace -p "^.+\s|\|.*" foo.fasta

or

seqkit replace -p ".+\s(\w+)\|.+" -r "\$1" foo.fasta

or just

seqkit seq -i --id-regexp "\s(\w+)\|" foo.fasta

ADD REPLY • link 3.2 years ago by shenwei356 8.7k