Question

Removing white space from the beginning of the second field (sequence) in a fasta file

0

Entering edit mode

5.3 years ago

Angie11 • 0

Hello,

Does anyone know of a command (in the linux command line) such as sed to remove white spaces from a specific field only? In my case, I have 2 tab-seperated fields in the format shown below and I would like to remove the white space from the beginning of the second field (the beginning of the sequence) without removing white spaces from the first field. It is a fasta format but I can convert it into a tab delimited text file if needed.

>10_GL0000024 root|cellular organisms|Bacteria|Firmicutes locus=scaffold18562_3:3421:4365:- [Complete]
 MELTFQTATPAERLYTTGQSMQIEGQMGYIGCLQTGMSEDGKGAFPKWSSGREGLNTEEFQQELAGVMDALIHDEQYGGFLKDSDAMRDFCQTHPESGFNNGFAFGFRADTAQYSYLIRLNPCKGEENLSICCYRRDWLDSHMKHAEKGIRFITPHYKEKFRIADGDKVRIRRFDGQVFDRVCRYIDDCHVEIGSELYHICQFAEIMERNGNSVIPLRSSLPFVCYGKVPEKRAIVMFERGFDGYRSASFATKGRTSQKLVDELNGELGVTKAQAAAMQGGATQGWASPAADPKNYDEQGQPIKPRHRDRGDAR

Thank you! Angie

sequence protein fasta linux • 2.3k views

ADD COMMENT • link updated 5.3 years ago by Chris Miller 22k • written 5.3 years ago by Angie11 • 0

score 0 · Answer 1 · 2019-08-26

0

Entering edit mode

5.3 years ago

bari.ballew ▴ 470

Take a look at regex anchors, which tie your pattern to the beginning or end of a line. You can use sed to remove whitespace at the beginning of a line only using "^", which anchors the pattern to the beginning of the line.

sed 's/^[\t ]*//' file.fa > secondFile.fa

ADD COMMENT • link 5.3 years ago by bari.ballew ▴ 470

score 0 · Answer 2 · 2019-08-26

0

Entering edit mode

5.3 years ago

Chris Miller 22k

if this is a fasta file, then you'll always want to remove leading whitespace, so something like:

sed 's/^\s//' myfile.fa

oughta work fine

ADD COMMENT • link 5.3 years ago by Chris Miller 22k