Hi there,
I have been working on my transcriptome assembly and now it is time to start doing functional annotation.
I have read that a way to do it is using SWISSPROT. The command that I have found is something like:
blastx -db ~/shared_ro/dbs/sprot.mini.pep
-query Trinity.fasta -num_threads 2 \
-max_target_seqs 1 -outfmt 6 -evalue 1e-5 \
> swissprot.blastx.outfmt6
However, and probably due to I am not an expert in this field, I don't know where to find the sprot.mini.pep file. I guess it is the database that SWISSPROT uses (or maybe not). But I don't know if I need to download it or it already installed when I installed Trinity since I did my transcript whit Trinity and I am following the commands in here: https://github.com/trinityrnaseq/BerlinTrinityWorkshop2018/wiki/functional_annotation
Thank you so much in advance
They downloaded sequences they find interesting from SWISSPROT, then ran
makeblastdb
. I think you can download the "mini-version" from their cloud or create your own database of interesting sequencesThank you so much Bastien. I have run the previos command:
and
So then I have files:
Can I use as a database of interesting sequences the file called "Trinity.fasta.transdecoder.pep"?
You could but you probably want to use that as a query against SWISSPROT for a
blastp
search. That would be an easier search to parse through.Ok, I see. But then, how can I create my own database of interesting sequenced?
Thanks
Normally you would search against SWISSPROT using proteins you predicted from Trinity analysis to see what their putative function is.
You just want to take a subset of proteins from SWISSPROT (
interesting sequences
)? Creating your own database would involvemakeblastdb
from BLAST+ package as already noted by @Bastien and your multi-fasta DB file. If you want to use entire SWISSPROT database then you can get premade indexes from NCBI's FTP site.Thanks.
What I want is to search against SWISSPROT using the Trinity.fasta file.
The problem is that I am working with a spp. from Plasmodium which genome is not anywhere. So, I am not sure whether I need to download a subset of proteins from SWISSPROT (by selecting all the ones from Plasmodium in the dataset) or create my own as you both said and using makeblastdb.