Hello everyone!
I am a beginner in bioinformatics, working on getting to understand and build WES data analysis pipelines (we work with human DNA in a clinical setting). I am overwhelmed with a torrent of new terms and esoteric-looking tools.
I realized that before I start experimenting with building a pipeline, I must find a solid way to gauge the validity of whatever VCF data the pipeline would produce.
Unfortunately, my supervisor (not a bioinformatician himself) claims that bioinformaticians, quote, "tend to complicate things unnecessarily". He claims that variant calling from WES data "can't be difficult" since "manual BLAST jobs produce a great alignment for any long enough sequence". So, according to him, all that a WES pipeline should do is run algorithms like BLAST for every read, no problem.
Even from my brief exposure to the literature on WES and NGS in general I get the feeling that this is not nearly as simple. So I need to study sources to, first, understand the complexities myself, — and, second, be able to articulate them to my boss.
Could anyone kindly point me to some at least remotely accessible literature on the technical challenges that WES pipelines typically solve?
Best wishes,
— Alex.
EDIT: Wording.
your supervisor is partially right - you may take one of the hundreds pipelines for WES data processing and just run it. However, keep in mind that 1) it require serious computational power, I mean - SERIOUS, 2) it takes quite long to fully analyse one human WES.
There are many technical challenges, mainly caused by mis-alignments, but in general, if you take a ready-to-use solution from some respectable source such as e.g. BROAD - I'd say you may forget them.
We might have different definitions of
serious
computational power, but for WES any normal workstation will do (if you have few samples) or any standard server node if you have more than a few.But I definitely agree to use available pipelines instead of reinventing the wheel. In the end you will need to verify those variant that you want to focus on anyway by independent experiments.
By the way, you might tell your supervisor that bioinformatics suffers from the same challenges as the wetlab. Just because someone in the world does something routinely (and after a lot of optimization and finetuning, and with the necessary experience) doesn't mean that it will work right away in your lab once you start setting it up.
Yeap, agree, I just looked at the description like "my supervisor (not a bioinformatician himself) claims that bioinformaticians, quote, "tend to complicate things unnecessarily". He claims that variant calling from WES data "can't be difficult" " - and made this statement about serious power since I've seen things after such discussions such as no budget for computing at all planned for the large computational project. For bioinformaticians with some money invested in servers/workstations WES analysis is not such a big deal.
Thank you for the encouragement! Yes, the analogy with “birth pangs” in a wet lab setting is helpful.