Question

How to evaluate which genome reference and alignment tool is "the best" for my pipeline?

0

Entering edit mode

2.8 years ago

ManuelDB ▴ 110

I have been given a couple of fastq files and I am evaluating what is the best tools to finally get a vcf (this is for learning activity).

At this point, I am going to do the alignment considering BWA-MEN vs Bowtie2 and GRCh38 vs GRCh37. After getting the resulting 4 bam files, I was wondering how I could evaluate the quality of my bam files. Of course, I will carry out in parallel the four pipelines and I will compare the variant called, but at this point, I would like to be able to compare the bam files as well.

For example, can I see the number of reads unmapped or any other quality control process that I don't know??

I am using galaxy.

pipeline NGS • 748 views

ADD COMMENT • link updated 2.8 years ago by Istvan Albert 102k • written 2.8 years ago by ManuelDB ▴ 110

0

Entering edit mode

best is subjective . Unless you are planning to run four pipelines in parallel for every sample you process you won't know for sure. Something that seems best for the test samples may not be for some future samples.

As for genome build, there is little reason to use GRCh37 unless you have some legacy data to compare to. GRCh38 was released in Dec 2013 and is mature/stable.

I am using galaxy.

Are you running your own Galaxy mirror? If not, you will not be in control of versions of software etc. if this is a long term project.

ADD REPLY • link 2.8 years ago by GenoMax 147k

0

Entering edit mode

Thanks for your comments.

I am running a mirror designed by my university with a limited set of tools and I can do the run with these different versions. I think I am going to discard the GRCh37 because as you mention there are no real reasons to do that and keeping working with so many pipelines is a pain. What I wanted to do is to at least run these for runs and then compare results to support my criteria when deciding why I won't be using GRCh37.

ADD REPLY • link 2.8 years ago by ManuelDB ▴ 110

score 1 · Answer 1 · 2022-02-01

The two aligners can produce quite similar results though getting to them can take a different path.

With default arguments bwa will "try harder" and do more work than bowtie2, hence will produce more alignments when reads are affected by noise and other misalignments. The tradeoff is that bowtie2 will run faster with its defaults.

bowtie2 can be tuned to work like bwa and operate similarly. But then, you have more control over alignment filtering with bowtie2.

The choice of the reference genome is far more impactful - though that too critically depends on just what exactly are you looking to discover.

Here is a somewhat older evaluation from 2017

Which human reference genome to use?: https://lh3.github.io/2017/11/13/which-human-reference-genome-to-use