Entering edit mode
8.4 years ago
vikisvk
▴
10
I'm trying to model the structure of a membrane protein for a final project of a university course. I'm using the following link: https://www.rosettacommons.org/docs/latest/application_documentation/structure_prediction/membrane-abinitio In step 4 - generate .lips4 file I need to run:
run_lips.pl <fasta file> <span file> <path to blastpgp> <path to nr database> <path to alignblast.pl script>
I have all the required parameters except the "path to nr database", so my question is how do i get the nr database?
I've tried searching it and got to the solution of downloading a nr.gz file (21GB!) and formatting it (which is also an action that I don't fully understand how to do), is this the right way to solve this?
As the page you linked above indicates you do
I am surprised that an assignment that needs a large DB does not consider its availability (I assume there are others besides you who are taking this course). Since you need to get this done, you can find pre-formatted nr blast database at this site. (You need to download all nr*.gz files and then unarchive them in a folder). It is going to be a large download.
blastpgp
is no longer present in currentblast+
package. Based on this post the older blast version you need to get is 2.2.26, which can be found here.Hello, thank you for your answer and the link to the database you provided.
I am currently learning how to ab initio a membrane protein and I am stuck in this same point as vikisvk, the nr database.
As you mentioned: the nr*.tar.gz database is MASSIVE (23GB), I do not have access to a fast enough internet that allows me to download this database in a reasonable amount of time.
My question is: is there a server or online tool I can use instead of having to download the database?
Any comment will help...
Since the perl script is going to run locally I don't think it would be possible to use an online resource for nr database.
Perhaps you can consider using a web server for the analysis such as ROSIE or Robetta
genomax2, I meant is there a way to submit my .fasta and .span files to a web server which can calculate and return to me the .lips4 file?
I would rather run 100,000 models on an HPC and control/understand my results rather than submit a fasta sequence to a "black box server" and blindly trust what it returns to me. Do you know of such a web server?
I tried Robetta fragment generation, but it does not provide a .lips4 file.
I managed to download the 27GB nr database, and un-archive it (131GB). But running the script gave me the following error:
I searched online to see how to fix this error, but all the solutions I found are several years old and are not working.
Any idea what I could be doing wrong?
Please use
ADD COMMENT
to reply to earlier comments and keep these threads easy to follow.