Hello friends, how're you? I wanted to know how I can do a blast against a database that I downloaded. I've downloaded a database of protein sequences from DEG (essential genes) and my idea is to do a blast against this database in my computer, using as input a long list of proteins.
The steps I performed so far were:
1) http://origin.tubic.org/deg/public/index.php/download (aminoacid sequence bacteria)
2) makeblastdb -in DEG-bacteria-db.faa -parse_seqids -dbtype prot
output:
FASTA-Reader: Ignoring invalid residues at position(s): On line 91713: 44
FASTA-Reader: Ignoring invalid residues at position(s): On line 102730: 48
FASTA-Reader: Ignoring invalid residues at position(s): On line 110967: 18
FASTA-Reader: Ignoring invalid residues at position(s): On line 112557: 18
FASTA-Reader: Ignoring invalid residues at position(s): On line 112604: 18
FASTA-Reader: Ignoring invalid residues at position(s): On line 112775: 18
FASTA-Reader: Ignoring invalid residues at position(s): On line 113161: 18
FASTA-Reader: Ignoring invalid residues at position(s): On line 113389: 18
FASTA-Reader: Ignoring invalid residues at position(s): On line 113405: 18
FASTA-Reader: Ignoring invalid residues at position(s): On line 113418: 18
FASTA-Reader: Ignoring invalid residues at position(s): On line 113681: 18
FASTA-Reader: Ignoring invalid residues at position(s): On line 113850: 18
FASTA-Reader: Ignoring invalid residues at position(s): On line 114182: 18
FASTA-Reader: Ignoring invalid residues at position(s): On line 114184: 18
FASTA-Reader: Ignoring invalid residues at position(s): On line 114210: 18
FASTA-Reader: Ignoring invalid residues at position(s): On line 114576: 18
FASTA-Reader: Ignoring invalid residues at position(s): On line 114656: 18
I don't understand the warning, help
it means you have invalid characters where blast does not expect them.
Can you makes sure you downloaded a FASTA format file? and the unzipping went correctly?
Some of the sequences have invalid characters that are not amino acids. For example, there is a dollar sign at the end of the DEG10340547 sequence. But you don't have to worry about the warnings, BLAST ignores illegal characters.