I am wondering if it is possible to use SwissProt as a substitute for NR database for PSSM generation, that is given a protein sequence, I want to generate a PSSM matrix using PSI-BLAST. The reason I am asking is because I am trying to develop a solution that ideally can be run locally in my machine without the need of a compute cluster, so the smaller the database the better.
However, I am concerned about a few questions:
- Is SwissProt too small for PSSM generation? SwissProt has about 500k sequences, while NR has about ~500M.
- Will PSSM generated by SwissProt be biased in any way?