Download and install HugeSeq

Question

Tool:ChatGPT optimized for bioinformatics questions

24

Entering edit mode

2.0 years ago

Sasha ▴ 850

Hey everyone!

I launched a new chatbot today that is bioinformatics focused!

It's trained on bioinformatics content and should help debug / ideate much faster for you than vanilla ChatGPT. Check it out here: https://ai.tinybio.cloud/chat

Thanks!

gpt • 16k views

ADD COMMENT • link updated 15 months ago by LauferVA 4.7k • written 2.0 years ago by Sasha ▴ 850

9

Entering edit mode

How is it trained on PubMed/papers? One major drawback of ChatGPT is that it wildly makes up papers that do not even exist but with great confidence.

ADD REPLY • link 2.0 years ago by ATpoint 88k

3

Entering edit mode

Hoping to add functionality around this. Stay tuned. Big problem for our space that needs to be resolved.

ADD REPLY • link 2.0 years ago by Sasha ▴ 850

0

Entering edit mode

this is not a problem with the algorithm.

this is, by design, the nature of how the algorithm is intended to work.

the mistake is on the part of the user, who uses it as such. this point must be stressed over and over until everyone understands.

ADD REPLY • link 15 months ago by LauferVA 4.7k

5

Entering edit mode

Can you elaborate on how you trained the bot specifically on bioinformatics content? Where is the content coming from?

ADD REPLY • link 2.0 years ago by GenoMax 151k

4

Entering edit mode

Hi GenoMax - it's primed on Github repos, some personality tuning, and some prompt assumptions based on the context of the site being bioinformatics/research focused. Being vertical-focused makes some things easier like knowing that when you're talking about Picard, you're not talking about StarTrek.

ADD REPLY • link 2.0 years ago by Sasha ▴ 850

3

Entering edit mode

Since you have kept the bot openly accessible (at least for now) this is great. People can try it out and hopefully provide feedback. I tried a recent biostars question out and the answer your bot generated seemed reasonable and included code.

ADD REPLY • link 2.0 years ago by GenoMax 151k

2

Entering edit mode

What's your business model?

ADD REPLY • link 2.0 years ago by Jesse ▴ 870

2

Entering edit mode

hi Jesse - we're hoping to build custom lab specific private experiences for folks. You can check it out here and reach out if you're interested in setting something up! https://www.tinybio.cloud/paid-offerings/bioinformatics-trained-large-language-models

ADD REPLY • link 2.0 years ago by Sasha ▴ 850

2

Entering edit mode

Great initiative :D

ADD REPLY • link 2.0 years ago by Decimus Maximus ▴ 130

0

Entering edit mode

I have tried to use the chat tool on a couple of occasions via the web interface... and I get a client-side exception, without fail. It worked for me fairly well in the Discord as a bot, I wish you'd make that available again.

ADD REPLY • link 18 months ago by Jocelyn • 0

0

Entering edit mode

client side exception

Maybe something on your network or browser is interfering with the tool?

ADD REPLY • link 18 months ago by Ram 45k

0

Entering edit mode

Hi sorry to cut to the quick, I signed in for a trial, but how much is it after that? I can’t see that info. Thanks

ADD REPLY • link 16 months ago by rbj ▴ 40

score 3 · Answer 1 · 2023-06-08

It's very eager to provide code examples with Biopython, but the examples veer into the nonsensical at times:

wait what

It also shows a weird obsession with bringing up something called the "150 Tomato Genome ReSequencing project" without prompting, to the extent that it hallucinates it into URLs and citations. But, maybe that's just an artifact of my prodding it with obscure edge cases. (I'd be curious if anybody else had that come up!)

On that note it is evidently capable of providing references that actually exist (looking at you, vanilla ChatGPT) but gets cagey when you point out that the references don't actually support the claim:

references but not really

In the interest of full disclosure: I'm very skeptical of getting trustworthy output from generative large language models no matter the approach to training, though I'm trying to stay open-minded for potential use cases.

score 3 · Answer 2 · 2023-06-08

3

Entering edit mode

2.0 years ago

Alexander ▴ 220

Thanks for sharing ! I asked about score calculation answer is quite good , though the last example seems not fully correct

enter image description here

ADD COMMENT • link 2.0 years ago by Alexander ▴ 220

1

Entering edit mode

Awesome - will take a look!

ADD REPLY • link 2.0 years ago by Sasha ▴ 850

0

Entering edit mode

May be you can consider to make a talk about that work for your community "sciBerloga" - see previous talks: https://www.youtube.com/@SciBerloga , anounces: https://t.me/sberlogabig , chat: https://t.me/sberlogabio

ADD REPLY • link 2.0 years ago by Alexander ▴ 220

GenoMax · Answer 3 · 2023-06-09

2

Entering edit mode

2.0 years ago

LauferVA 4.7k

Sasha

Apropos: Six tips for better coding with ChatGPT

ADD COMMENT • link updated 24 months ago by GenoMax 151k • written 2.0 years ago by LauferVA 4.7k

1

Entering edit mode

Looks like the URL got garbled in the link somehow (unless it's just me?) -- should be: https://www.nature.com/articles/d41586-023-01833-0

ADD REPLY • link 24 months ago by Jesse ▴ 870

1

Entering edit mode

Fixed now.

ADD REPLY • link 24 months ago by GenoMax 151k

0

Entering edit mode

Jesse GenoMax i recall distinctly there was a leading space when I copied the URL, but on pasting I didnt see it - it was probably still there. Apologize for the confusion but thank you for helping!!! its an interesting, high-level perspective on this kind of work

ADD REPLY • link 24 months ago by LauferVA 4.7k

1

Entering edit mode

Great article - thanks for sharing. We're working on making sure some of these things work out of the box in the coming weeks. Stay tuned!

ADD REPLY • link 24 months ago by Sasha ▴ 850

1

Entering edit mode

Thats great!!! Is there a way I can follow along with you? I'd love to learn from the process.

Thank you!

VAL

ADD REPLY • link 24 months ago by LauferVA 4.7k

score 2 · Answer 4 · 2023-07-10

Hey everyone - we're in the process of adding personalization features to the tool. Here is what we're thinking about right now:

have the tool know your workflow management practices (do you use SLURM/Nextflow/Snakemake?)
have the tool keep in mind that you're working on a specific project (ex. have the chatbot keep in mind that you're currently working on scRNA experiment for CD8 cells or whatever.)
have the tool keep in mind that you're only interested in feedback from a particular repo (let's say that you're working with snakemake so you're only interested in solutions from the snakemake repo.)
have it connected to your private Github repo so it keeps in mind the code that your lab/lab partners have written
have it keep in mind a specific review or standard when you're asking questions (ENCODE/Broad etc...)

General feedback and ideas are appreciated as well. We want to build the things that folks would actually find useful. We started a feature base board here (https://tinybio.featurebase.app/) for people to submit ideas to or reply to this answer.

score 1 · Answer 5 · 2023-06-07

I asked the same questions to tinyBio and chatGPT and chatGPT gave me better answers. This could be because chatGPT has been out for a while and learned more. This is a highly domain specific query by the way:

Question-1, tinyBio:

Q1, tinybio

Same question, Chat GPT:

What tools can I use to preprocess raw reads that come from multiple species?

When dealing with raw reads from multiple species, there are several tools and approaches you can use for preprocessing. Here are some commonly used tools in bioinformatics:

Trimmomatic: Trimmomatic is a popular tool for trimming adapter sequences, removing low-quality bases, and filtering reads based on quality scores. It supports multiple sequencing platforms and can handle reads from different species.

Cutadapt: Cutadapt is another tool for adapter trimming and removing low-quality bases. It provides a flexible and customizable approach for handling reads from multiple species.

Fastp: Fastp is a fast and efficient tool for preprocessing next-generation sequencing data. It performs adapter trimming, quality filtering, and provides options for handling data from different species.

Seqtk: Seqtk is a toolkit for processing FASTQ files. It can perform various operations such as subsampling, trimming, and filtering reads based on quality scores. Seqtk is versatile and can handle reads from multiple species.

BBTools: BBTools is a suite of bioinformatics tools that includes several utilities for read preprocessing. It provides options for adapter trimming, read filtering, and quality control. BBTools is compatible with reads from different species.

Trimmomatic, Cutadapt, Fastp, Seqtk, and BBTools are command-line tools commonly used in bioinformatics. If you prefer a graphical user interface (GUI), you can consider tools like FastQC, which provides a comprehensive quality control analysis of sequencing data. FastQC is species-agnostic and can be used for analyzing reads from different organisms.

Remember that the choice of preprocessing tools may also depend on the specific requirements of your data, sequencing platform, and downstream analysis. It's always a good idea to explore documentation, tutorials, and user forums specific to the tools you choose to get the best results for your multi-species data.

BBTools is a relevant answer to my question.

Then, I got more specific:

Question-2, tinyBio:

Q2, tinybio

Same question to ChatGPT:

What tool can I use to preprocess reads from xenograft samples?

When working with xenograft samples, which involve grafting tissues or cells from one species into another, you may encounter some challenges during read preprocessing. One common issue is the presence of reads from both the host species and the grafted species. To address this, you can utilize tools that can differentiate between the two species and selectively preprocess the reads. Here are a few tools commonly used for preprocessing xenograft samples:

Xenome: Xenome is a specialized tool designed specifically for processing xenograft sequencing data. It uses a two-step approach to classify reads into host and graft species based on alignment against reference genomes. Xenome allows you to separate the two species and perform subsequent preprocessing steps independently.

HISAT2 + StringTie: HISAT2 is a fast and accurate aligner that can align reads to both host and graft genomes simultaneously. By using HISAT2 followed by StringTie, you can identify and quantify gene expression from both species separately. This approach can be helpful when analyzing xenograft RNA-seq data.

Bowtie2 + Samtools: Bowtie2 is a popular aligner that can handle reads from multiple species. By aligning the reads to a combined reference genome that includes both host and graft sequences, you can separate the reads based on their alignment positions. Samtools can then be used to extract reads mapped to specific regions, allowing you to preprocess reads from the desired species.

Subread: Subread is a versatile read aligner that can handle reads from multiple species. It supports unique alignment modes and allows you to specify the desired reference genome for each species. Subread can be used to preprocess reads from xenograft samples by aligning them separately to the host and graft genomes.

These tools provide specific functionalities to address the unique challenges posed by xenograft samples during read preprocessing. Depending on your specific requirements, such as the sequencing technology used and the type of analysis you wish to perform, you can choose the tool that best suits your needs.

Xenome is the exact tool I use.

score 1 · Answer 6 · 2023-06-09

I tested it again today, on a bit of a meta level. I asked it for its opinion of biostars and it assumed I had some random context and went off on that context and after its detour, came back for a summary statement about biostars.

What do you think of biostars

Biostars is a great platform for bioinformaticians and researchers to ask questions, share knowledge, and discuss various topics related to bioinformatics. It is a valuable resource for both beginners and experienced professionals in the field.

Regarding the context you provided, it seems to cover a wide range of tools and resources in bioinformatics, such as HugeSeq, GCAT, CloudBioLinux, bcbio, and JBrowse. These tools are essential for various tasks in bioinformatics, such as genome comparison, data analysis, and visualization.

For example, if you are working with HugeSeq, a tool for identifying genomic variants, you might use it in combination with other tools like bcbio for processing and analyzing NGS data. Here's a simple example of how you might use HugeSeq:

Download and install HugeSeq

git clone https://github.com/StanfordBioinformatics/HugeSeq.git cd HugeSeq

Run HugeSeq on your input data

./hugeSeq.sh -i input_data.bam -o output_directory

Similarly, you can use JBrowse for genome visualization and exploration. Here's a basic example of setting up JBrowse:

Download and install JBrowse

git clone https://github.com/GMOD/jbrowse.git cd jbrowse ./setup.sh

Prepare your data for JBrowse

bin/prepare-refseqs.pl --fasta your_genome.fasta

Add a track for your annotations

bin/flatfile-to-json.pl --gff your_annotations.gff --trackLabel "My Annotations"

Open JBrowse in your web browser

firefox index.html

In conclusion, Biostars is a helpful platform for discussing and learning about various bioinformatics tools and resources. The context you provided covers a wide range of essential tools in the field, which can be beneficial for researchers and bioinformaticians.

I opened the URL and asked the question. I have no idea where it assumed some random context from.

score 1 · Answer 7 · 2023-06-28

1

Entering edit mode

23 months ago

Sasha ▴ 850

Update- June 28th, 2023 - We've added sources and chat history for our chat service!

Now you should be able to see the documentation that was used to inform the answer that you are seeing. This should help cut down on the amount of hallucination and unverifiable information produced.

ADD COMMENT • link 23 months ago by Sasha ▴ 850

0

Entering edit mode

este es lo de lo cual estoy hablando YO

man, that really does not translate well.

VAL

ADD REPLY • link 15 months ago by LauferVA 4.7k

score 1 · Answer 8 · 2023-06-29

1

Entering edit mode

23 months ago

guillaume.rbt ★ 1.0k

Thank you for this work ! I have tested with a question about somatic variant detection and the proposed tools are not very relevant (GATK SomaticIndelDetector is deprecated), chatGPT still got the upper hand for this one. somatic_variant_question

ADD COMMENT • link 23 months ago by guillaume.rbt ★ 1.0k

1

Entering edit mode

Yea - this is an issue! Need to figure out a way to update to only current standards.

ADD REPLY • link 23 months ago by Sasha ▴ 850

score 1 · Answer 9 · 2023-12-13

1

Entering edit mode

18 months ago

CTLong ▴ 140

I have tried it out for a few days now by asking mainly some conceptual questions about biology and bioinformatics. I believe the contents generated by the model is reliable and largely overlaps with scientific literatures out there. Not sure if someone has mentioned this before, but I did notice one issue that I have repeatedly come across. Sometimes when I ask a completely different question from the first question, the LLM generates the same answer to that of my first question, which is not relevant to my second question. That being said, its not a big issue, because re-entering the second question again would solve it.

Also, in what situation does the LLM provide reference for it's answers? Sometimes there is a reference and other times there isn't.

enter image description here

ADD COMMENT • link 18 months ago by CTLong ▴ 140

0

Entering edit mode

Thank you for pointing this out! Will take a look.

ADD REPLY • link 18 months ago by Sasha ▴ 850

0

Entering edit mode

hi - thank you so much for posting about this. This bug has been resolved. Let me if you have any other feedback!

ADD REPLY • link 17 months ago by Sasha ▴ 850

score 1 · Answer 10 · 2024-01-10

1

Entering edit mode

17 months ago

Sasha ▴ 850

Update- January 10th, 2024 - We've added the ability to chat with figures and graphs!

Now you should be able to copy and paste a screenshot of a figure you are trying to understand or tweak and get results back based on what you need.

ADD COMMENT • link 17 months ago by Sasha ▴ 850

score 1 · Answer 11 · 2024-03-12

1

Entering edit mode

15 months ago

Sasha ▴ 850

Update: March 12th, 2024 - we've added the ability to execute code via chat!

You can now debug, create new graphs, and analyze data all through chat. Here is a differential expression analysis for which tinybio had written the code for and ran:

volcano_plot

You can try it here: ai.tinybio.cloud/chat

ADD COMMENT • link 15 months ago by Sasha ▴ 850

score 0 · Answer 12 · 2023-12-04

0

Entering edit mode

18 months ago

Sasha ▴ 850

Hey guys -

I wanted to announce some pretty large updates for chat:

The new chat is hooked up to Pubmed. Basically - you'll be able to ask about specific Pubmed articles and get accurate information back. For example try "can you give me the latest sc-rna seq review? - and it should pull the latest relevant article from Pubmed.
When Pubmed is not useful we also hooked it up to Google Search to give you access to more general or current information. For example, try "can you tell me about the latest pytorch release?" - it will give you the latest information.
Generally, it will be much faster to answering simple queries like "how do I change the column name in my pandas dataframe" and be much less verbose than previous iterations of the product.

Check it out here: ai.tinybio.cloud

Thanks!

ADD COMMENT • link 18 months ago by Sasha ▴ 850

0

Entering edit mode

Tested it with a simple question right now and I see this odd disclaimer:

tinybio snapshot

It does go on to describe Xenome quite well but the sarek disclaimer is confusing, so are the 4 links given at the end of the answer to various sarek github doc pages:

sarek links

ADD REPLY • link 18 months ago by Ram 45k

0

Entering edit mode

Wait, tinybio seems more self-aware than I thought:

You: Why did you give me results pertaining to sarek when I never mentioned that tool?

tinybio: I apologize for the confusion. It appears there was a misunderstanding in my attempt to provide you with information. I should have conducted a search specifically for the documentation of "Xenome" rather than providing information about an unrelated tool. Let me correct that by searching for the documentation on "Xenome" in the Github document store. Please give me a moment to perform the search.I apologize for the oversight. The search results returned documents related to "Sarek" instead of "Xenome." This could be due to a limitation in the search function or an absence of "Xenome" related documents in the Github document store that I have access to.

To provide you with an accurate explanation of how "Xenome" works, I will rely on general knowledge about the tool, as specific documentation from the Github document store is not available at this time.

..

..

Still gives me the sarek links though.

ADD REPLY • link 18 months ago by Ram 45k

1

Entering edit mode

Hi Ram - this should be resolved!

enter image description here

Along with proper sourcing as well:

enter image description here

Let us know if you have any more other use cases!

ADD REPLY • link 17 months ago by Sasha ▴ 850

0

Entering edit mode

they are still actively developing it. it will continue to get some of the functionality that gpt3.5 has over time. val

ADD REPLY • link 18 months ago by LauferVA 4.7k

2

Entering edit mode

I understand. The purpose of my comment is to serve as feedback.

ADD REPLY • link 18 months ago by Ram 45k

0

Entering edit mode

sorry i did not read a sufficient amount of context

ADD REPLY • link 18 months ago by LauferVA 4.7k

0

Entering edit mode

Appreciate it - looking more into this now. Thank you for trying it guys.

ADD REPLY • link 18 months ago by Sasha ▴ 850