create protein database for local Blast
1
0
Entering edit mode
3.2 years ago
BATMAN • 0

Hello friends, how're you? I wanted to know how I can do a blast against a database that I downloaded. I've downloaded a database of protein sequences from DEG (essential genes) and my idea is to do a blast against this database in my computer, using as input a long list of proteins.

The steps I performed so far were: 1) http://origin.tubic.org/deg/public/index.php/download (aminoacid sequence bacteria) 2) makeblastdb -in DEG-bacteria-db.faa -parse_seqids -dbtype prot

output:

   FASTA-Reader: Ignoring invalid residues at position(s): On line 91713: 44
    FASTA-Reader: Ignoring invalid residues at position(s): On line 102730: 48
    FASTA-Reader: Ignoring invalid residues at position(s): On line 110967: 18
    FASTA-Reader: Ignoring invalid residues at position(s): On line 112557: 18
    FASTA-Reader: Ignoring invalid residues at position(s): On line 112604: 18
    FASTA-Reader: Ignoring invalid residues at position(s): On line 112775: 18
    FASTA-Reader: Ignoring invalid residues at position(s): On line 113161: 18
    FASTA-Reader: Ignoring invalid residues at position(s): On line 113389: 18
    FASTA-Reader: Ignoring invalid residues at position(s): On line 113405: 18
    FASTA-Reader: Ignoring invalid residues at position(s): On line 113418: 18
    FASTA-Reader: Ignoring invalid residues at position(s): On line 113681: 18
    FASTA-Reader: Ignoring invalid residues at position(s): On line 113850: 18
    FASTA-Reader: Ignoring invalid residues at position(s): On line 114182: 18
    FASTA-Reader: Ignoring invalid residues at position(s): On line 114184: 18
    FASTA-Reader: Ignoring invalid residues at position(s): On line 114210: 18
    FASTA-Reader: Ignoring invalid residues at position(s): On line 114576: 18
    FASTA-Reader: Ignoring invalid residues at position(s): On line 114656: 18

I don't understand the warning, help

makeblastdb • 1.4k views
ADD COMMENT
0
Entering edit mode

it means you have invalid characters where blast does not expect them.

Can you makes sure you downloaded a FASTA format file? and the unzipping went correctly?

ADD REPLY
0
Entering edit mode

Some of the sequences have invalid characters that are not amino acids. For example, there is a dollar sign at the end of the DEG10340547 sequence. But you don't have to worry about the warnings, BLAST ignores illegal characters.

ADD REPLY
1
Entering edit mode
3.2 years ago
Mensur Dlakic ★ 28k

I think the warnings should not be ignored, even if BLAST does so. For example, this error:

FASTA-Reader: Ignoring invalid residues at position(s): On line 114184: 18

translates into this sequence at that particular line:

>DEG10420110
Not available now.

There are many sequences with Not available now and I think they should be removed, because most of those letters are legitimate amino-acids even if spaces and periods are ignored. It may not be a bad idea to alert database authors that some of their sequences are corrupt.

ADD COMMENT

Login before adding your answer.

Traffic: 1563 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6