Entering edit mode
9 months ago
Kevin
▴
100
Hey Biostars!
I'm building a web app that lets you generate ready-to-run bioinformatics pipelines using a simple chat interface.
Here's how it works:
- Describe your desired pipeline, in natural language. For example: "Build me a variant-calling pipeline for Illumina paired-end human data that uses GPU acceleration" or something like "a small RNA-seq pipeline with the following steps: fastqc -> cutadapt -> bowtie2 -> samtools -> deseq2"
- An AI-powered system builds your pipeline then performs live unit tests on each step, re-trying until each step produces appropriate output.
- Run your pipeline in the cloud on your own data, on our SOC2-compliant web platform.
I'll be sending out beta invites starting at the end of Feb. If you're interested, you can sign up for the waitlist here: https://shire.bio
Please let me know what you think! I'd love to hear your requests, doubts, concerns, questions, etc :)
Who writes those unit tests?
The unit tests are written and performed by the AI. For an alignment step, for example, the AI will try to align some small fastq's to a small reference, then it'll make sure a bam is produced. Of course, a human could write a more thorough test; here, we're really just checking to see if the step completes without errors. But for a completely hands-off test, I don't think it's too bad!
Does this mean
bwa
andtouch $sample.bam
will both pass the test? Or will the AI reuse existing tests designed by developers?Yep right now
touch $sample.bam
would pass the testSo you're offering less than current cloud platforms do? Pipelines tested not by bioinformaticians employed by the company but by AI?
Hey Ram -- the AI builds the pipeline and takes care of the plumbing, but as a bioinformatician you'll be able to audit and modify what the AI produces. Think of it as a way to do rapid prototyping and hypothesis-testing. And if you need custom pipeline design by a real bioinformatician, we offer that service too, here: https://www.shirebio.com
So, ChatGPT + some test data? I like the idea because testing it on data sounds nice but there's no way you can tell if the pipeline works just from running it on some toy data.
Plus, given today's environment where you need to bring the code to the data, your site relies on me to bring you the data which is not great. If you were to give me a CWL script or a Snakemake file based on this AI powered pipelining + testing system, I'd use it a lot more, because as a bioinformatician I wish I were doing more analyses and writing fewer glue scripts.
Like "Design a CWL workflow that uses a Docker container to classify xenograft scATAC-seq FASTQs, where the Docker container has seqtk, fastq-tools and Xenome available in its environment". If your tool can design a pipeline and give me test xenograft scATAC-seq FASTQs, I'd pay good money to use it.
This is good feedback, thanks. It might not be too hard to adapt my current system so that it'll spit out a fully-containerized pipeline that you can run locally. Something like that. And I thought I was the only one who used fastq-tools all the time, glad to see it's not dead software.
Apologies for bumping an old thread. Ram, I've been thinking about your comment for months. Would love to discuss further if you have a moment for a call. Lmk kevin@shirebio.com