As an illumina customer, do you use basespace, alternative cloud solution or pipelines built in-house?
3
5
Entering edit mode
9.9 years ago
mani824 ▴ 70

Hi

We have a hiseq 2000 and manage our own GATK pipeline, grid, storage. We are in the process of purchasing a NextSeq 500 and are worndering if basespace would be a feasable longterm data analysis/housing solution?

If you are an illumina customer, do you use base space? if not can you explain the reason?

ADDENDUM: We do whole exome, tumor/normal (small panels to whole exome), transcriptome, and other 1000-5000 genes custom panels.

Thanks
Manfred

basespace illumina • 6.5k views
ADD COMMENT
4
Entering edit mode
9.9 years ago

Hello,

Well the answer to your first question depends on what analysis do you actually perform. If those are some in-house pipelines that have no commonly used alternative, then the best solution for large-scale projects is to get familiar with cloud solutions, such as AWS. Note that Basespace "locks" you within its data management system, so you can't easily incorporate custom data processing steps.

As for the second question, in my lab we are currently working in a rather small field that has few publicly available software tools, so we mostly run our in-house pipelines on our server infrastructure (yet we had an idea to submit our apps to Basespace). Occasionally we have to work with big chunks of data sent by our collaborators, so we have developed a pipeline to manage AWS instances for such tasks.

ADD COMMENT
2
Entering edit mode

+1 for AWS. I've been using it extensively for the past 6 months or so and it has sped up my work tremendously. Here are some tips for using AWS:

  • Use spot instances whenever you can. I find that the r3.8xlarge and c3.8xlarge instances are actually not fully in use most of the time (EU-West), so I can easily get them for stretches of days for around 35cents an hour.
  • Make a small sized EB volume or image with all your favorite tools installed (with all dependencies locally) so you can easily mount this drive to your instance when you start working. Create a export PATH bash script to sort out your PATH variable. I actually prefer this to making a custom AMI because it is more flexible.
  • Use StarCluster if you need a HPC-like cluster (uses sun-grid engine). It is extremely easy to setup and start. You can set it up to use spot instances also.
ADD REPLY
2
Entering edit mode

Also +1 for AWS.

All instances has also local SSD drives (one of them is mounted to /mnt on Ubuntu instances; for example r3.8xlarge has 2x320 GB SSD on board); by using them you can reduce you bills (especially on spot instances) and increase throughput because local drives are installed inside the actual compute node, and are much faster then EBS.

We do it in this way:

  1. Upload data to S3.
  2. Start spot instance(s).
  3. Download data from S3 on the instance local folder on ephemeral drive (like /mnt) using AWS CLI.
  4. Process data.
  5. Upload data back to S3 using AWS CLI.

Sometimes we automate all this steps by scripts executed using cloud-init, without accessing instance through SSH.

ADD REPLY
1
Entering edit mode

Thanks Mikhail

I added the apps we run to my question above. Thanks for your feedback. Ours seem to be pretty common workflows that would likely use many of the tools available as basespace apps, however we are still cautious about being "locked in" to their data management/storage/LIMS etc.

ADD REPLY
3
Entering edit mode
9.9 years ago
Cliff Beall ▴ 480

I played around with the basespace applications a bit and was pretty unimpressed. From my experience it's not up to the task, though I'm not doing the same applications as you.

For one example, the 16S analysis. It is pretty basic and inaccurate but I thought might give a preliminary idea of our samples. However, it would only work with some limited number of samples at a time (I think 50). If you tried to specify more it just kind of silently failed and only did 50 out of how ever many you specified.

Also Illumina has not responded to support requests.

ADD COMMENT
1
Entering edit mode

That's helpful to know Cliff. Did you end up using the results from basespace or were more conformable redoing it with your own pipelines anyway ?

The reason I ask is, I am trying to figure out if

  1. there is problem with the inherent architecture/data management/reporting of basespace or
  2. is there a problem with insufficiency in the variety of apps to do what you want

If its the second, then I foresee many new apps being added in near future, but if its the first, then that's a bigger problem

ADD REPLY
1
Entering edit mode

I wish they allowed threading apps together for building workflows/pipelines

ADD REPLY
0
Entering edit mode

I could see the usefulness of that - but unlike seven bridges, they can't ensure that the output of one app will behave nicely going into the next app. Apps are built by 3rd party vendors here

ADD REPLY
2
Entering edit mode
9.9 years ago

If possible, always good to build pipelines on your own, so that you will have more control and are highly customizable.

ADD COMMENT
0
Entering edit mode

Thanks for your comment

ADD REPLY

Login before adding your answer.

Traffic: 1880 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6