Hi all,
I'm dealing with a fasta file with spaces at the end of line, which caused the problem. I didn't find a suitable way to remove them. Please kindly tell me the appropriate command for removing them?
Hi all,
I'm dealing with a fasta file with spaces at the end of line, which caused the problem. I didn't find a suitable way to remove them. Please kindly tell me the appropriate command for removing them?
Is that you have an space or a lack of the end of line code?
If your data are tab separated, and you have an space only at the end of the lane, you can do the following
cat file.fasta | tr -d " " > newfile.fasta
But notice that this will get rid of all spaces, including those at the middle of the lane.
sed 's/ *$//g' in.fasta > out.fasta
will remove only spaces at the end of lines. To remove tab or space use:
sed 's/\s*$//g' in.fasta > out.fasta
Not a bioinformatics questions, you should try Stack Overflow for this, but here is a quick answer in perl:
perl -i.bak -pe 's/\h+$//' sequences.fa
Oh my gawk!
All previous solutions would risk modifying your fasta header as well. This one will not.
gawk 'BEGIN{line=0}{ if ($0 !~/^>/ && $0 ~/ +/ ) {gsub(/ +/, //); line++} print}END{print line" lines with white spaces treated" > "/dev/stderr"}' myfasta.fa >output.fa
If you only want to remove the spaces at the end of the lines:
gawk 'BEGIN{line=0}{ if ($0 !~/^>/ && $0 ~/ +$/ ) {gsub(/ +$/, //); line++} print}END{print line" lines with white spaces treated" > "/dev/stderr"}' myfasta.fa>output.fa
Hi,
I think you could make use of the python rstrip()
string method. Just call it while reading your fasta file, and it will handle the the white spaces as you want.
for line in open('path_to_fasta_file'):
print line.rstrip()
Copy the code into a file, say my_script.py, and run
python my_script.py
There you go
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Note for sed on Mac OS X, you have to use
[[:space:]]
instead of\s
: