Question

Forum:How do you estimate time to complete a project or task?

5

Entering edit mode

4 months ago

sviatoslav.kendall ▴ 970

Since my very first bioinformatics job, I have struggled with answering the question, "How much time do you need?" and I wonder if anyone can offer some helpful advice for estimating time to complete a bioinformatics project or task.

Obviously, when answering this question in a professional setting, it's important to consider context of who is asking, how firmly you will be expected to meet your proposed timeline, what decisions or projects cannot proceed until you have finished, etc. But I am asking about the actual challenge of assessing how long it will realistically take you to complete a given project. Some things that I find particularly difficult to estimate how long they will take are:

finding the best or most appropriate software to use for new type of analysis
installing software and/or getting it to work properly
literature searches
fixing bugs

Again, there's plenty of contextual nuance to consider with each of these and in some cases it's just not possible to make an accurate estimate of how long something will take you. All that aside, how do you figure out how much time you need to complete a bioinformatics project?

career ngs • 2.2k views

ADD COMMENT • link written 4 months ago by sviatoslav.kendall ▴ 970

score 9 · Answer 1 · 2025-01-28

Ball park how long you think it'll take you assuming you're half as competent as you think you are, and then quadruple it.

Slightly more seriously (though really only slightly), I don't provide estimates until I've had a chance to do an exploratory data analysis and have a clear understanding of the project intentions or question(s) being asked. Then again, I am supporting a single lab as the sole computationalist, so your situation may require more direct estimates off the bat. For our purposes, "I'll take a quick look by Friday" where I can get initial processing done, check QC, and spot-check expected differences is usually a sufficient response. It gives a timeline for next update and buys me some time to determine next steps and adjust my plan based on EDA. But again, I am not charging by the hour and can adjust my workload to meet arbitrary self-set deadlines.

As others have mentioned, if you're already familiar with the data modality, you tend to get a feel for which analyses may reap the most immediate benefit for any given question. As such, estimation tends to depends on the scope of the project. If it's a straightforward question, it's pretty easy to assess the existing options and try a few. If the project is really going to require tying many modalities together, that's a much tougher task and generally more difficult to estimate.

As a rule, if a tool is a giant pain to install or configure, I don't use it if there are other options. Given the ecosystems available for shipping software at this point, sucky installations usually indicate poorly designed/implemented/maintained projects.

Ram · Answer 2 · 2025-01-28

6

Entering edit mode

4 months ago

GenoMax 151k

But I am asking about the actual challenge of assessing how long it will realistically take you to complete a given project.

I assume your question is about doing this assessment for yourself i.e you are not doing this for someone else who is working with you. As more places move to a "cost recovery" model, this is an important issue facing people who work in cores.

When you have prior experience with a particular type of analysis this should be relatively routine to come up with an estimate over time, especially as you gather more experience. If you are using a workflow management system setting up an analysis for 10 to hundreds of samples is likely not going to require 10x the time. The downstream analysis where iteration may be required with the customer is where most of the time may come from. The more number of samples you have you will want to build in some fraction of time to address workflow/software and other incidental failures.

Your bullet list on the other hand is tougher to address, especially if you are a one person shop and are expected to do "everything", including software installs/data management/customer support. In such cases it may be best to be frank about your inability to come up with a good estimate upfront. You could offer to spend say 8-10h investigating problem at hand (part of this you may need to charge to institute overhead since it would be unfair to charge all of it to the customer) and at the end of that perhaps come up with a more concrete estimate. If a software looks particularly complicated/onerous to install it may be best to cut your losses and stop within 30 min and look for alternatives that you can suggest to the customer.

At the end of the day, adding a 10-15% markup over the time estimate/cost you come up with, may be a safe bet. Saving the customer money over projected cost will always get you kudos.

ADD COMMENT • link 4 months ago by GenoMax 151k

2

Entering edit mode

it may be best to be frank about your inability to come up with a good estimate upfront.

This is close to what I (plan to) do. I'd give 1 hour of my time for free and give them a very rough estimate, erring on the side of caution. Then I'd keep them posted on the progress. Given that I'm the cheapest resource in my core, people won't mind if I go even 2X the estimate, as long as their results are accurate.

My time also goes down if they have a bioinformatics person that can read datasets/upload and download files. Like you said, I use snakemake and number of samples is the least of my time parameters.

ADD REPLY • link 4 months ago by Ram 45k

2

Entering edit mode

When you are talking about a cost recovery model, I also think you have to consider what cost the customer is willing to bear. Our core used to estimate 30 hours for non-trivial RNAseq project including everything from QC to publication quality figures on specific biological questions (not just a list of DE genes). However, this meant that they had to charge £3,000 for the analysis, and most customers either couldn't or wouldn't pay this preferring to half-arse it themselves or get the sequencing provider to do a bad job of it (this was at the time when a certain sequencing provider's default RNA-seq analysis was Bowtie1 alignment to the genome followed by a t-test).

ADD REPLY • link updated 4 months ago by Ram 45k • written 4 months ago by i.sudbery 21k

0

Entering edit mode

I also think you have to consider what cost the customer is willing to bear.

Not something you can control. Cost recovery directives generally come from higher up and rarely consider the ground truth you describe above.

ADD REPLY • link 4 months ago by GenoMax 151k

0

Entering edit mode

This is my point - if you are told to charge £100 a hour by the higher-ups, but you know that the customer will only agree to pay up to £1000, then you'd better not estimate the task as taking longer than 10 hours.

ADD REPLY • link 4 months ago by i.sudbery 21k

1

Entering edit mode

Perhaps works differently at your institution. "Cost recovery" is generally calculated by time spent and the per hr cost of the analyst (based on their salary) doing that analysis without making a profit. Only directive is to recover as much of per hour cost as you practically can.

ADD REPLY • link 4 months ago by GenoMax 151k

score 5 · Answer 3 · 2025-01-29

5

Entering edit mode

4 months ago

JC 13k

Don't forget to add extra time for:

can you change colors/labels/formats?
can you add this other dataset?
can you compare with X tool?
can you explain me what you did?

etc.

ADD COMMENT • link 4 months ago by JC 13k

score 4 · Answer 4 · 2025-01-28

Not sure if this will help you, but I will explain my process. I tend to spend the minimum amount of time at the start on literature searches, as I prefer to get into solving the problem. So early on I like to focus on lifting the project off the ground and get into details later. As a result, I usually repeat the analysis many times as I find out a better dataset, or more finely tuned parameters. That's not necessarily a bad thing because by that time steps are usually automated. Also, it allows me to test the consistency of various solutions, rather than getting stuck in a single solution that came about after a protracted research.

As to the time it takes, it depends on deadlines, person's savviness, familiarity with the problem, computer hardware, whether they are a 9am-5pm type of person, their interest in solving the problem, and many other factors. My projects get done faster when I am under a deadline and when I am personally invested in solving a problem. Unless the problem is inherently intractable, it is the literature analysis that takes the most time.