however, I check my data and this is no the case.
Sequence parsers usually don't throw this error unless there is a problem. You can test this yourself, but in my experience computers tend to be more careful in checking these things then people.
Here is what you can do, assuming your FASTA file is called sequences.fas
:
grep ">" sequences.fas | awk '{print $1}' | wc -l
grep ">" sequences.fas | awk '{print $1}' | sort -u | wc -l
These two commands will print out two numbers. If the first number is larger than the next, your sequence names are not unique.
If the numbers were equal in previous exercise, try pasting this line:
for i in {3..30}; do grep ">" sequences.fas | cut -c 1-$i | wc -l && grep ">" sequences.fas | cut -c 1-$i | sort -u | wc -l && echo "" ; done
This will print a series of two numbers, separated by empty lines. If at any point the two numbers are not identical, your sequence names are not unique.
This is a great explanation, however, I check my data and this is no the case.
Anyway, thanks for the comment!
Another thing to consider is some programs may even go further and consider certain number of characters for names. For example
If only first 8 characters or less are considered in
NAME
field then these two names become non-unique.My data have this "ID problem", I check and correct this, however, the predictor keep send the same message: input sequence names are not unique
This is strange!