Help setting up lncRNA-screen from github
1
1
Entering edit mode
6.2 years ago

Hello everyone!

I'm new to bioinformatics, and I'm having a really hard time trying to make this work. What I'm trying to set up is this https://github.com/NYU-BFX/lncRNA-screen

So I'm working with Long non-coding RNAs, and this pipeline created by Applied Bioinformatics Laboratories (New York, NY), does exactly what I need. However, I'm finding quite hard to set it up, could anyone help me?

It says it uses SGE which I only got it to work with docker, is SGE really necessary? I only have 1 machine.

Needs to install and set to path r/3.3.0, python/2.7.3, java/1.8 and samtools/1.3

It has a linked folder for my RNA-seq and Chip-seq but I don't know how that works.

Also says I need https://github.com/NYU-BFX/RNA-Seq_Standard even if I have my own RNA-seq (which I do have).

The documentation says sratoolkit is included, but, my lack of experience makes me not understand how that works. Here's a requirement file https://github.com/NYU-BFX/lncRNA-screen/blob/master/inputs/system_requirement.txt

This is my first post here, so I may do something wrong or post this question in the wrong place.

lncRNA GitHub RNA-Seq ChIP-Seq • 1.5k views
ADD COMMENT
0
Entering edit mode

Did you ever get this working? I am interested to compare notes regarding the format of the resulting BED file.

ADD REPLY
3
Entering edit mode
6.2 years ago

SGE is a scheduling/job submission system for computing clusters. You don't need it to run locally, though your machine better be a beast, as STAR uses a lot of RAM and is slow without several processors (as is every aligner). If you have only a few samples, you can probably get away with it, but if you have dozens, you're going to be waiting a while. I'd see if your organization has a computing cluster that you can get access to.

As for R, python, java, and samtools, they are all easy to install and add to your PATH. You can google how to do it depending on your system, and many distros can install them through package managers. Or you can look into Anaconda, which makes install all of those and automatically adding them to PATH very easy regardless of your OS. sratoolkit is also simple to install.

The links are symlinks, basically saying where your folders containing your ChIP-seq and RNA-seq data should be relative to that folder. In this case, it looks like you should have folders for them a level up from the installation directory. You can also just replace the link so that it points to wherever your data files are for each.

Installing something like this is a headache even for experienced bioinformaticists - relative lack of documentation, heavy reliance on relative paths, etc. I imagine it is one of those things that will result in a million errors with uninformative tracebacks that you'll spend days fixing before getting it to run in full. If you don't know how to install/add basic programs to your PATH, I would take a few days to learn how to do that and utilize the command line properly. Otherwise, you will likely continue to be frustrated.

ADD COMMENT
0
Entering edit mode

Good that I don't need SGE. We have a local server with an old Xeon, Ubuntu server 16.04 and 32gb of ram (and that's all we can have for now).

About the r/3.3.0, python/2.7.3, java/1.8 and samtools/1.3 install and add to your PATH, I know how to do it, I forgot to mention in this part that they mentioned "Software Environment Management", and I don't know any, I don't really need it but it's something that could be helpful for me even in other projects.

Thanks a lot for your answer Jared.

ADD REPLY

Login before adding your answer.

Traffic: 2472 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6