Question

Herald:The Biostar Herald for Monday, October 17, 2022

2

Entering edit mode

2.5 years ago

Biostar 3.4k

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.

This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan Albert,

A practical guide to methods controlling false discoveries in computational biology | Genome Biology | Full Text (genomebiology.biomedcentral.com)

We investigate the accuracy, applicability, and ease of use of two classic and six modern FDR-controlling methods by performing a systematic benchmark comparison using simulation studies as well as six case studies in computational biology.

submitted by: Istvan Albert

Osamu Gotoh (the Gotoh in the Smith-Waterman-Gotoh algorithm) is still actively maintaining and improving his spaln aligner. The more I run the tool these days, the more I am in awe of how well spaln is developed and how much work Osamu has put into it. It's stunning.
— Heng Li (@lh3lh3) October 12, 2022

submitted by: Istvan Albert

GTDB - About (gtdb.ecogenomic.org)

The Genome Taxonomy Database (GTDB) is an initiative to establish a standardised microbial taxonomy based on genome phylogeny. The genomes used to construct the phylogeny are obtained from RefSeq and GenBank, and GTDB releases are indexed to RefSeq releases, starting with release 76.

The GTDB taxonomy is based on genome trees inferred using FastTree from an aligned concatenated set of 120 single copy marker proteins for Bacteria, and with IQ-TREE from a concatenated set of 53 (starting with R07-RS207) and 122 (prior to R07-RS207) marker proteins for Archaea (download page here ).

submitted by: Istvan Albert

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac672/6758240

The Genome Taxonomy Database (GTDB) and associated taxonomic classification toolkit (GTDB-Tk) have been widely adopted by the microbiology community. However, the growing size of the GTDB bacterial reference tree has resulted in GTDB-Tk requiring substantial amounts of memory (∼320 GB) which limits its adoption and ease of use. Here we present an update to GTDB-Tk that uses a divide-and-conquer approach where user genomes are initially placed into a bacterial reference tree with family-level representatives followed by placement into an appropriate class-level subtree comprising species representatives. This substantially reduces the memory requirements of GTDB-Tk while having minimal impact on classification.

submitted by: Istvan Albert

Benchmarking of long-read assemblers for prokaryote... | F1000Research (f1000research.com)

We used 500 simulated read sets and 120 real read sets to assess the performance of eight long-read assemblers (Canu, Flye, Miniasm/Minipolish, NECAT, NextDenovo/NextPolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used.

submitted by: Istvan Albert

GitHub - google/deepconsensus: DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data. (github.com)

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.

This results in greater yield of high-quality reads. See yield metrics for results on three full SMRT Cells with different chemistries and read length distributions.

submitted by: Istvan Albert

GitHub - rrwick/Trycycler: A tool for generating consensus long-read assemblies for bacterial genomes (github.com)

Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes. I.e. if you have multiple long-read assemblies for the same isolate, Trycycler can combine them into a single assembly that is better than any of your inputs.

submitted by: Istvan Albert

KAGE: fast alignment-free graph-based genotyping of SNPs and short indels | Genome Biology | Full Text (genomebiology.biomedcentral.com)

Genotyping is a core application of high-throughput sequencing. We present KAGE, a genotyper for SNPs and short indels that is inspired by recent developments within graph-based genome representations and alignment-free methods. KAGE uses a pan-genome representation of the population to efficiently and accurately predict genotypes. Two novel ideas improve both the speed and accuracy [...]. We show that the accuracy of KAGE is at par with the best existing alignment-free genotypers, while being an order of magnitude faster.

submitted by: Istvan Albert

Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription

herald • 776 views

ADD COMMENT • link 2.5 years ago by Biostar 3.4k