I want to get all proteins from the NCBI nr datbase that are smaller than 200 amino acids. I want to use them to make a local database to blast for a target small protein. I tried downloading nr.gz from ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA which is described as:
Sequence databases in FASTA format for use with the stand-alone BLAST programs.
These databases must be formatted using formatdb before they can be used with BLAST.
This was the closest thing I could find to get all the fasta sequences, but the database fasta format is not something I ever worked with, and because of the size of the file (10gb) I can only manage to open it with Less, and as far as I've seen it seems to be mostly the sequence headers.
So I'm looking for a way to download all nr protein sequences OR a different way to do a BLAST search against all proteins <= 200 a.a.
Thanks,
Niek
Does this lower the e-values because the database size gets smaller?