The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.
This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan Albert,
The hazards of genotype imputation when mapping disease susceptibility variants | Genome Biology | Full Text (genomebiology.biomedcentral.com)
The cost-free increase in statistical power of using imputation to infer missing genotypes is undoubtedly appealing, but is it hazard-free? This case study of three type-2 diabetes (T2D) loci demonstrates that it is not; it sheds light on why this is so and raises concerns as to the shortcomings of imputation at disease loci, where haplotypes differ between cases and reference panel.
submitted by: Istvan Albert
Comprehensive and accurate genome analysis at scale using DRAGEN accelerated algorithms | bioRxiv (www.biorxiv.org)
Here we present DRAGEN that utilizes novel methods based on multigenomes, hardware acceleration, and machine learning based variant detection to provide novel insights into individual genomes with ~30min computation time (from raw reads to variant detection). DRAGEN outperforms all other state-of-the-art methods in speed and accuracy across all variant types (SNV, indel, STR, SV, CNV) and further incorporates specialized methods to obtain key insights in medically relevant genes (e.g., HLA, SMN, GBA). We showcase DRAGEN across 3,202 genomes and demonstrate its scalability, accuracy, and innovations to further advance the integration of comprehensive genomics for research and medical applications.
submitted by: Istvan Albert
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad775/7510834
de novo variants (DNVs) are variants that are present in offspring but not in their parents. DNVs are both important for examining mutation rates as well as in the identification of disease-related variation. While efforts have been made to call DNVs, calling of DNVs is still challenging from parent-child sequenced trio data. We developed Hare And Tortoise (HAT) as an automated DNV detection workflow for highly accurate short-read and long-read sequencing data. Reliable detection of DNVs is important for human genomics and HAT addresses this need.
submitted by: Istvan Albert
The accuracy of absolute differential abundance analysis from relative count data | PLOS Computational Biology (journals.plos.org)
We used simulated data to explore the consistency of differential abundance calling on renormalized relative abundances versus absolute abundances and find that, while overall consistency is high, [...] consistency can be much lower where there is widespread change in the abundance of features across conditions.
submitted by: Istvan Albert
Characterization and visualization of tandem repeats at genome scale | Nature Biotechnology (www.nature.com)
Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool.
submitted by: Istvan Albert
Genomic background sequences systematically outperform synthetic ones in de novo motif discovery for ChIP-seq data | bioRxiv (www.biorxiv.org)
We performed a massive comparison of the synthetic and genomic approaches to generate background sequences for de novo motif discovery. The synthetic approach shuffled nucleotides in peaks, while in the genomic approach randomly selected sequences from the reference genome or only from gene promoters according to the fraction of A/T nucleotides in each sequence. We compiled the benchmark collections of ChIP-seq datasets for mammalian and Arabidopsis, and performed de novo motif discovery. We showed that the genomic approach has both more robust detection of the known motifs of target transcription factors and more stringent exclusion of the simple sequence repeats as possible non-specific motifs.
submitted by: Istvan Albert
Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall (genome.cshlp.org)
We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection.
submitted by: Istvan Albert
Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription