Hi friends,
My question is so easy for you!, please let me know how to check if there is any extra space or blank line in the fasta file, what is the appropriate command?
Hi friends,
My question is so easy for you!, please let me know how to check if there is any extra space or blank line in the fasta file, what is the appropriate command?
Assuming your header does not have any spaces, I am checking for spaces only after or before header/sequence, and blank lines
egrep -n "^\s*$|\s$|^\s" filename.fasta
^\s*$
- check blank lines\s$
- check if a space is present at the end of the line^\s
- check if a space is present at the beginning of the lineHi Seta,
Can you be more specific in what exactly you want to do? What exactly do you mean by extra space? And, do you want to just find out if there are blank lines? Or do you want to remove them or look at where they are? Which OS are you using? Assuming you're running Linux, or an OS that has Perl, you can use the following command to print the line numbers of blank lines:
$ perl -ne 'print "$.\n" if /^\s*$/' <fasta file>
And the following command will tell you the number of blank lines in your file on Linux:
$ perl -ne 'print "$.\n" if /^\s*$/' <fasta file> | wc -l
Cheers,
TEJ
Just to add my 2p to already good answers... You can use cat -vet
to visualize non-printable characters and the end-of-lines. For example:
echo -e "foo\tbar \rbaz " > test.txt # Test file
cat -vet test.txt
foo^Ibar ^Mbaz $
# Compare to plain cat
cat test.txt
baz bar
tab character is displayed as ^I
, carriage return as ^M
, end of line as $
. This is very useful to quickly check files with unexpected characters as produced by e.g. Excel.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi Prakki Rama,
Based on your helpful command, I found that there is 1117159 space at the end of line. Could you please let me know the right command to remove them?
you can also use
sed
:The sed example will remove only the first instance of a whitespace character per line. Also, it will irreversibly modify the original file without backing it up.
No. I think the above
sed
command will remove all the spaces in the file."g"
means run globally on whole file and on multiple instances per line. From what I understand, seta found a space only in 1117159 line of his file. So, sed should remove since it is the only instance in that line. It does not irreversibly modify the original file because we are not using"-i"
and redirecting output to screen.You changed the 'sed' command. The original contained a trailing 'd' (delete!) instead of 'g' (global). And I assumed you intended to write to the original filename, since you didn't specify a new one (although you would have ended up with an empty file). Otherwise, writing to stdout as shown would not save the edits.
And I believe the OP found 1117159 total whitespaces in the file, not a single space at the end of line 1117159. But I could be wrong on that point.
Sorry. It was my mistake. It was a typo. Instead of
"g"
, i typed"d"
. But having"d"
would anyway throw error. That is why I changed to "g".On the point of saving the edits, yes. Unless we redirect to a new file or put
"-i"
after sed, the edits cannot be saved.Oh. I think I misunderstood the sentence. If it was 1117159 spaces at the end of line, then:
This will edit the same file by removing the spaces at the end of the file. Apologies for my overlooking.
without
-i
Hi
The first perl command and sed command with
i
, removed all sequences so thatgrep -c ">" file.fa
returned 0.I tried the last command (
sed 's/\s*$//' file1.fa > file1_1.fa
). the appearance of fasta file like turn to like this: (Also that the related command to check the space don't work on this fasta file)I have not enough experience in programing. Would you please let me know how I can change it to normal fasta file form?, my mean is:
could you paste few lines of your original fasta file?
My original fasta file is the the same above-mentioned shape. I didn't notice that.
I think you some how got mixed up with commands and changed the original file format. That is why I did not save edits and was printing on to the shell in my first
sed
command.Assuming all your contigs names have a digit;
---input---
---output---
Thank you very much for your responses. It worked well