Question

Forum:New to bioinformatics.....

0

Entering edit mode

6.8 years ago

sms.00196 • 0

Newbie Rant here: Am I the only one who thinks that the learning curve for the NCBI databases and tools is unnecessarily steep?

I want to obtain and use the information, not take hours to locate it.

There...I feel better now!

gene • 3.5k views

ADD COMMENT • link updated 20 months ago by Ram 44k • written 6.8 years ago by sms.00196 • 0

1

Entering edit mode

Well, everybody wants to hit Enter and get the desired result:) Unfortunately (of fortunately?) it doesn't work like that, you have to put some (sometimes substantial) efforts.

ADD REPLY • link 6.8 years ago by grant.hovhannisyan ★ 2.6k

0

Entering edit mode

Seems to me that there is a golden opportunity for private sector to develop a software tool that does exactly this. I state what question I want to answer, give the data I have, and hit Enter. The program does the rest--chooses the best search parameters based on the query and produces the most relevant output.

ADD REPLY • link 6.8 years ago by sms.00196 • 0

4

Entering edit mode

this has been the business model pushed by dozens of failed bioinformatics startups since 1996

ADD REPLY • link 6.8 years ago by Jeremy Leipzig 23k

2

Entering edit mode

They have all approached it the wrong way. What would really be a success is: someone gives the software the answer one wants, and hit enter. Then, the software finds the question and the data to satisfy the answer.

ADD REPLY • link 6.8 years ago by h.mon 35k

2

Entering edit mode

We have that, it's called simulation software :-)

ADD REPLY • link 6.8 years ago by Ram 44k

0

Entering edit mode

Computer, can you give me the ultimate answer to life the universe and everything?

ADD REPLY • link 6.8 years ago by WouterDeCoster 47k

1

Entering edit mode

This actually makes a really good point. Multiple tools exist to get tasks done, and we take the call on which of them to use based on context and experience. When such a complex problem is given to computers, especially when we do not have well established and agreed upon training sets, machines will take too long to produce an answer that makes sense to the algorithm but not to the research world in general, AKA 42.

ADD REPLY • link 6.8 years ago by Ram 44k

0

Entering edit mode

Fun fact: the ASCII code of 42 is *, or the "everything" wildcard.

ADD REPLY • link 6.8 years ago by WouterDeCoster 47k

1

Entering edit mode

You want private sector to develop an AI bioinformatician? Because that's what you're describing - an entity that takes the right call on methods to use given data and an idea on the kind of results you're looking for.

ADD REPLY • link 6.8 years ago by Ram 44k

0

Entering edit mode

Yes, I think algorithms could do this. Sure, lots of work. But what a product! .gov is not going to have the incentive to do this.

ADD REPLY • link 6.8 years ago by sms.00196 • 0

4

Entering edit mode

You weren't lying, you are new :-)

Current AI technology cannot be trusted to do our job. They can't even be trusted to accurately sequence a genome, and that's just one machine doing one task.

ADD REPLY • link 6.8 years ago by Ram 44k

0

Entering edit mode

Well said Ram! Full agreement

ADD REPLY • link 6.8 years ago by Kevin Blighe 88k

1

Entering edit mode

This is a complex issue I think. Of course it sounds sweet, but this software would cost a lot (it solves important problems, saves time and very complex), while the majority of scientific organization in the world (I don't mean top Western institutions) are poor :) And I guess that's why as mentioned by @Jeremy, we see a quite limited number of really successful and big bioinformatics companies today (and most of them are quite recent because of NGS revolution). We can keep discussing.

ADD REPLY • link 6.8 years ago by grant.hovhannisyan ★ 2.6k

0

Entering edit mode

I've seen the 'products' of these companies and the analyses are invariably conducted incorrectly, with sloppy figures being produced, in addition. On top of this, they charge you both an arm and a leg.

ADD REPLY • link 6.8 years ago by Kevin Blighe 88k

0

Entering edit mode

Most of the bigger SaaS players (e.g. SBG, DNANexus) have taken a different approach than selling "we can make your analysis easy" to every PI and have made most of their money through partnerships either with government or pharma.

ADD REPLY • link 6.8 years ago by Jeremy Leipzig 23k

0

Entering edit mode

I partially agree. Learning the nuances of the differences between databases can take time, but for 'newbies' its typically enough to get comfy with (t)BLAST(n/p) and probably PubMed.

It doesn't get much simpler than a search box and a database though (which is exactly how NCBI is built), so I'm afraid there is no option other than to suck it up and get stuck in!

ADD REPLY • link 6.8 years ago by Joe 22k

0

Entering edit mode

Maybe that's the problem. The input part seems simple, but interpreting the output is onerous. I want to spend time on science, not the technology. But, unfortunately, the reality is that the scientists and the technicians both need to understand the other. A nice partnership if you can get it.

ADD REPLY • link 6.8 years ago by sms.00196 • 0

0

Entering edit mode

Is it?

Personally, I think the output of a BLAST search for instance (I'd wager probably the single most used feature of NCBI's infrastructure, maybe after PMC), is incredibly intuitive. You even get a picture! And I'm not saying that as a now semi-seasoned bioinformatician/molecular biologist. I remember learning what BLAST was for the first time in my bachelors degree as if it was yesterday and it just 'clicked'. Granted, this is just a personal anecdote, and not everyone thinks the same way, but I really don't think you could ask for much more intuitive output.

My main criticism of the NCBI interface even now, is that the 'click through-y-ness' is opaque sometimes for sure. Trying to download all the genomes for a particular BioProject, or taxon or whatever, can require some pretty intimate understanding of how NCBI structures their data.

ADD REPLY • link 6.8 years ago by Joe 22k

2

Entering edit mode

My main criticism of the NCBI interface even now, is that the 'click through-y-ness' is opaque sometimes for sure.

To be fair things you are listing are not simple queries/tasks. They inherently require clarity about what you are trying to achieve. Designing user interfaces (that are intuitive) is an art. I am sure NCBI keeps evolving those overtime. Because NCBI is so large a repository the regression testing they need to do must be a task in itself to ensure that things don't break. Since BLAST is their most popular tool they do keep that in pretty good shape/current/fresh.

ADD REPLY • link 6.8 years ago by GenoMax 148k

0

Entering edit mode

Oh I quite agree, but even something that one might expect to be a simple process (like getting a full genbank download properly!) can trip you up.

ADD REPLY • link 6.8 years ago by Joe 22k

0

Entering edit mode

Same issue in wetlab work. The techniques are complex and a field in themselves, but not basic science.
Primary reason I chose in silico, to get away from the burden of wetlab work. Unfortunately, I find its as bad or worse. Interested in others views on this.

ADD REPLY • link 6.8 years ago by sms.00196 • 0

4

Entering edit mode

So both wet lab and in silico work are too complex for you. Well, I have some bad news for you then.

ADD REPLY • link 6.8 years ago by WouterDeCoster 47k

3

Entering edit mode

Nonsense, this guy is executive material. He'll be my boss someday.

ADD REPLY • link 6.8 years ago by Jeremy Leipzig 23k

0

Entering edit mode

"Just built the pipeline already man and do your bioinformatic analysis."

ADD REPLY • link 6.8 years ago by WouterDeCoster 47k

1

Entering edit mode

"What my team has developed under my expert leadership is a comprehensive solution for data-driven predictive personalized medicine and cost savings. Big Data. IoT."

ADD REPLY • link 6.8 years ago by Jeremy Leipzig 23k

1

Entering edit mode

I'm going to be sick.

ADD REPLY • link 6.8 years ago by cschu181 ★ 2.8k

1

Entering edit mode

Don't worry we have an app for that, too.

ADD REPLY • link 6.8 years ago by WouterDeCoster 47k

1

Entering edit mode

Can you elaborate on the wet lab point? I'm a wet lab biologist (primarily in fact). If anything I think the wet lab has too much of the opposite problem. The techniques are so old and ingrained in a lot of cases, everyone just buys kits and black boxes everything - it's pretty much impossible to understand the basic science of every tiny thing.

But that's how it has to be. Science is too big now for everyone to be an expert in very much at all.

ADD REPLY • link 6.8 years ago by Joe 22k

1

Entering edit mode

Yes, I was also primarily wet lab based before branching into bioinformatics. Science in the wet lab has become a 'kit-based', any many of these kits are expensive and do not even work. Companies even sell them to you after the 'sell-by' date. So much money is wasted in research as a result of this, un-necessarily so. If companies actually properly tested their products better, instead of just releasing their own curated white papers, then it may improve.

ADD REPLY • link 6.8 years ago by Kevin Blighe 88k

0

Entering edit mode

The interface and use of most NCBI resources is reasonably easy, and with a bit of reading and searching you can even become a "power user" rather quickly.

If you are talking about SRA and SRAtolkit, on the other hand, I wholeheartedly agree.

ADD REPLY • link 6.8 years ago by h.mon 35k

0

Entering edit mode

You forgot to add NCBI unix command line utils (to the list with SRA) :-)

ADD REPLY • link 6.8 years ago by GenoMax 148k

score 3 · Answer 1 · 2018-03-26

Am I the only one who thinks that the learning curve for the NCBI databases and tools is unnecessarily steep?

Maybe? One of the skills a bioinformatician needs is on-demand learning. Focus on knowing what tools can do, not learning how to do everything before you start using them. NCBI is one of the easiest databases out there and its results are as clear as can be.

I want to obtain and use the information, not take hours to locate it.

What information? Bioinformatics databases are not service sites like Amazon or eBay, they are data sharing sites. We are not entitled to finding results relevant to us quickly. As time goes, people understand how these tools work and how to get what we need out of them the best way possible. If you have an idea on how to do something better, do it and we will support you. But remember, new tools don't always help:

score 2 · Answer 2 · 2018-03-26

2

Entering edit mode

6.8 years ago

GenoMax 148k

Have you looked at the learning resources that NCBI has. Plenty of YouTube videos if you prefer to learn that way.

ADD COMMENT • link 6.8 years ago by GenoMax 148k