which database to blast against genomes.
1
0
Entering edit mode
7.5 years ago
Jacob ▴ 10

What I am trying to do is blast about 50 RNA sequences against the genomes of various organisms (All large genomes for vertebrates, other animals). I am trying to do this locally because it takes an enormous amount of time otherwise.

How I am trying to do this is by running update_blast.pl (a script provided in blast+ to locally download databases via ftp).

The problem is this is taking very long and the files that I am downloading are taking up a lot of space. I'm wondering if there is a more sensible way to do this because the databases are taking up about 100 gb on my computer right now.

Heres how I am running it (If I don't adjust the timeout, it usually doesn't finish)

update_blastdb.pl --timeout 800 nt.

I am also downloading

refseq_genomic
nr

And have completed

refseq_rna

I will want to perform these blast searches for multiple organisms, but each individual blast search will be against one organism

On a side note, I've noticed if I need to ^C certain downloads and restart them, they will finish but not otherwise

refseq_genomic nt blast blastn blast+ • 2.2k views
ADD COMMENT
0
Entering edit mode

What I am trying to do is blast about 50 RNA sequences against the genomes of various organisms (All large genomes for vertebrates, other animals)

What is the reason for doing that? What kind of RNA sequences are these? Are you trying to identify what genome those 50 sequences are from or the actual identity of the genes?

ADD REPLY
0
Entering edit mode

The genes are all human, I'm trying to determine if there is a significant match for each gene in a number of other organisms as determined by the e value.

ADD REPLY
1
Entering edit mode

If you want to do this as practice then great but NCBI/EBI has likely done this work for you. You can check NCBI's homologene section to access multiple alignments or alignments available in UCSC genome browser as a track.

ADD REPLY
0
Entering edit mode

Thanks, yeah it is basically for practice right now, I'm going to be making adjustments in the future though so I need to do it this way

ADD REPLY
0
Entering edit mode

Depending on your goal, there may be faster solutions than BLAST. BBMap's SendSketch tool can taxonomically classify an organism in a few seconds, depending on the genomic size; it can compare your data to nt, RefSeq, and Silva for that purpose. You don't need to download any big files.

As Genomax asked... what are you trying to accomplish? Also, what kind of data do you have?

ADD REPLY
0
Entering edit mode
7.5 years ago
h.mon 35k

If you have just 50 sequences, blast them online, you can paste or upload a multifasta file to NCBI web blast. It will be quite fast, especially if you leave the search at its defaults - much faster than downloading the databases. Maybe you have 50 transcriptomes instead of 50 sequences?

ADD COMMENT
0
Entering edit mode

I need to do it on the command line, because of changes I need to make to my code in future. Do you have any recommendations for databases to blast against? Or how to do it the fastest, I'll keep using these databases over and over so if it takes a day to download thats not a big deal.

ADD REPLY

Login before adding your answer.

Traffic: 1701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6