Basic question about SWISSPROT
0
1
Entering edit mode
5.8 years ago
luzglongoria ▴ 50

Hi there,

I have been working on my transcriptome assembly and now it is time to start doing functional annotation.

I have read that a way to do it is using SWISSPROT. The command that I have found is something like:

blastx -db ~/shared_ro/dbs/sprot.mini.pep
-query Trinity.fasta -num_threads 2 \
 -max_target_seqs 1 -outfmt 6 -evalue 1e-5 \
> swissprot.blastx.outfmt6

However, and probably due to I am not an expert in this field, I don't know where to find the sprot.mini.pep file. I guess it is the database that SWISSPROT uses (or maybe not). But I don't know if I need to download it or it already installed when I installed Trinity since I did my transcript whit Trinity and I am following the commands in here: https://github.com/trinityrnaseq/BerlinTrinityWorkshop2018/wiki/functional_annotation

Thank you so much in advance

SWISSPROT RNA-Seq transcriptome • 1.8k views
ADD COMMENT
0
Entering edit mode

Another very useful metric in evaluating your assembly is to assess the number of fully reconstructed coding transcripts. This can be done by performing a BLASTX search of your assembled transcript sequences to a high quality database of protein sequences, such as provided by SWISSPROT. Searching a large protein database using BLASTX can take a while - longer than we want during this workshop, so instead, we'll search the mini-version of SWISSPROT that comes installed in our data/ directory

They downloaded sequences they find interesting from SWISSPROT, then ran makeblastdb. I think you can download the "mini-version" from their cloud or create your own database of interesting sequences

ADD REPLY
0
Entering edit mode

Thank you so much Bastien. I have run the previos command:

TransDecoder.LongOrfs -t Trinity.fasta

and

TransDecoder.Predict -t Trinity.fas

So then I have files:

Trinity.fasta.transdecoder.bed
Trinity.fasta.transdecoder.cds
Trinity.fasta.transdecoder.pep

Can I use as a database of interesting sequences the file called "Trinity.fasta.transdecoder.pep"?

ADD REPLY
0
Entering edit mode

You could but you probably want to use that as a query against SWISSPROT for a blastp search. That would be an easier search to parse through.

ADD REPLY
0
Entering edit mode

Ok, I see. But then, how can I create my own database of interesting sequenced?

Thanks

ADD REPLY
0
Entering edit mode

Normally you would search against SWISSPROT using proteins you predicted from Trinity analysis to see what their putative function is.

You just want to take a subset of proteins from SWISSPROT (interesting sequences)? Creating your own database would involve makeblastdb from BLAST+ package as already noted by @Bastien and your multi-fasta DB file. If you want to use entire SWISSPROT database then you can get premade indexes from NCBI's FTP site.

ADD REPLY
0
Entering edit mode

Thanks.

What I want is to search against SWISSPROT using the Trinity.fasta file.

The problem is that I am working with a spp. from Plasmodium which genome is not anywhere. So, I am not sure whether I need to download a subset of proteins from SWISSPROT (by selecting all the ones from Plasmodium in the dataset) or create my own as you both said and using makeblastdb.

ADD REPLY

Login before adding your answer.

Traffic: 2604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6