rosetta - nr database
0
1
Entering edit mode
8.4 years ago
vikisvk ▴ 10

I'm trying to model the structure of a membrane protein for a final project of a university course. I'm using the following link: https://www.rosettacommons.org/docs/latest/application_documentation/structure_prediction/membrane-abinitio In step 4 - generate .lips4 file I need to run:

run_lips.pl <fasta file> <span file> <path to blastpgp> <path to nr database> <path to alignblast.pl script>

I have all the required parameters except the "path to nr database", so my question is how do i get the nr database?

I've tried searching it and got to the solution of downloading a nr.gz file (21GB!) and formatting it (which is also an action that I don't fully understand how to do), is this the right way to solve this?

software-error blast • 2.4k views
ADD COMMENT
2
Entering edit mode

As the page you linked above indicates you do

need note that blastpgp and nr database are necessary to run run_lips.pl script

I am surprised that an assignment that needs a large DB does not consider its availability (I assume there are others besides you who are taking this course). Since you need to get this done, you can find pre-formatted nr blast database at this site. (You need to download all nr*.gz files and then unarchive them in a folder). It is going to be a large download.

blastpgp is no longer present in current blast+ package. Based on this post the older blast version you need to get is 2.2.26, which can be found here.

ADD REPLY
0
Entering edit mode

Hello, thank you for your answer and the link to the database you provided.

I am currently learning how to ab initio a membrane protein and I am stuck in this same point as vikisvk, the nr database.

As you mentioned: the nr*.tar.gz database is MASSIVE (23GB), I do not have access to a fast enough internet that allows me to download this database in a reasonable amount of time.

My question is: is there a server or online tool I can use instead of having to download the database?

Any comment will help...

ADD REPLY
0
Entering edit mode

Since the perl script is going to run locally I don't think it would be possible to use an online resource for nr database.

Perhaps you can consider using a web server for the analysis such as ROSIE or Robetta

ADD REPLY
0
Entering edit mode

genomax2, I meant is there a way to submit my .fasta and .span files to a web server which can calculate and return to me the .lips4 file?

I would rather run 100,000 models on an HPC and control/understand my results rather than submit a fasta sequence to a "black box server" and blindly trust what it returns to me. Do you know of such a web server?

I tried Robetta fragment generation, but it does not provide a .lips4 file.

I managed to download the 27GB nr database, and un-archive it (131GB). But running the script gave me the following error:

Error in alignblast.pl: blast output file BRD4.blast truncated: 
readline() on closed filehandle MSA at /home/computer/rosetta_src_2016.13.58602_bundle/tools/membrane_tools/run_lips.pl line 95.
Use of uninitialized value $highest_lipo_index in array element at /home/computer/rosetta_src_2016.13.58602_bundle/tools/membrane_tools/run_lips.pl line 205.

I searched online to see how to fix this error, but all the solutions I found are several years old and are not working.

Any idea what I could be doing wrong?

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT to reply to earlier comments and keep these threads easy to follow.

ADD REPLY

Login before adding your answer.

Traffic: 2377 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6