Remove whitespaces on fasta files, except on fasta-header

0

Entering edit mode

3.7 years ago

genomes_and_MGEs ▴ 10

Hey everyone,

I have a multi-fasta file like this:

>NC_000914 464618..534825
gtgccttccattttggagcgggaccaaatcgcagcggttctggtaagtgcgagcagggac gtgccttccattttggagcgggaccaaatcgcagcggttctggtaagtgcgagcagggac
aaaacgccggccggcttgcgggaccatgcgatattacaactgctcgccacctacggactg aaaacgccggccggcttgcgggaccatgcgatattacaactgctcgccacctacggactg
cgatcaggagaaatccgcaacatgcggattgaggatatcgattggcggaccgaaaccatt cgatcaggagaaatccgcaacatgcggattgaggatatcgattggcggaccgaaaccatt

I would like to remove whitespaces from the fasta sequences, but keep the whitespaces on the fasta-headers (>). I use this command sed -i '/^>/ s/ .*//' file.fasta to remove whitespaces from fastaheaders, but now I want the opposite. Is this possible?

Thanks!

sequence • 1.9k views

ADD COMMENT • link updated 3.7 years ago by cpad0112 21k • written 3.7 years ago by genomes_and_MGEs ▴ 10

0

Entering edit mode

negate the headers in current command line. But be careful while using i and current commandline is not correct to remove only spaces in header.

ADD REPLY • link 3.7 years ago by cpad0112 21k

0

Entering edit mode

Thank you for the reply. But how to negate that? Yes, I'll be careful with -i, thanks for the tip

ADD REPLY • link 3.7 years ago by genomes_and_MGEs ▴ 10

1

Entering edit mode

/^>/! and current commandline removes space and any thing after that. Do not use that for this purpose. Try this: sed '/^>/! s/\s\+//g' test.fa

ADD REPLY • link 3.7 years ago by cpad0112 21k

Login before adding your answer.