The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.
This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan Albert,
Use your PhD thesis title as the prompt 🤓
Is it time to restart this trend? But now with #DALLE3?
(Here is mine: "Protein structure determination using evolutionary information") https://t.co/kJAeGJTDLj pic.twitter.com/fyEBRH4CrO
— Sergey Ovchinnikov 🇺🇦 (@sokrypton) October 31, 2023
Use your PhD thesis title as the prompt 🤓
Is it time to restart this trend? But now with #DALLE3?
(Here is mine: "Protein structure determination using evolutionary information") https://t.co/kJAeGJTDLj pic.twitter.com/fyEBRH4CrO
submitted by: Istvan Albert
The #gnomAD team is proud to announce the release of gnomAD v4! The v4 dataset includes 730,947 exomes & 76,215 genomes, which is ~5x larger than the v2 & v3 releases combined, & includes nearly 120K indivs of non-European genetic ancestry https://t.co/YKXIFlZwSi #ASHG23 (1/11) pic.twitter.com/hp6zO3xWW9
— Genome Aggregation Database (@gnomad_project) November 1, 2023
The #gnomAD team is proud to announce the release of gnomAD v4! The v4 dataset includes 730,947 exomes & 76,215 genomes, which is ~5x larger than the v2 & v3 releases combined, & includes nearly 120K indivs of non-European genetic ancestry https://t.co/YKXIFlZwSi #ASHG23 (1/11) pic.twitter.com/hp6zO3xWW9
— Genome Aggregation Database (@gnomad_project) November 1, 2023submitted by: Istvan Albert
Carp in the Soil. Ridiculous sequencing results revealed… | by Sixing Huang | Medium (dgg32.medium.com)
Ridiculous sequencing results revealed how errors propagated from one research study to a global database
submitted by: Istvan Albert
Nanopore adaptive sampling: a tool for enrichment of low abundance species in metagenomic samples | Genome Biology | Full Text (genomebiology.biomedcentral.com)
Adaptive sampling is a method of software-controlled enrichment unique to nanopore sequencing platforms. To test its potential for enrichment of rarer species within metagenomic samples, we create a synthetic mock community and construct sequencing libraries with a range of mean read lengths. Enrichment is up to 13.87-fold for the least abundant species in the longest read length library; factoring in reduced yields from rejecting molecules the calculated efficiency raises this to 4.93-fold.
submitted by: Istvan Albert
Omics! Omics!: Concept: An Oxford Nanopore Adaptive Sequencing IDE (omicsomics.blogspot.com)
In adaptive sequencing, bases called from the initial sequencing of a fragment can be used to determine whether to continue sequencing or alternatively the voltage is reverse for that pore only and the fragment is ejected back to the cis side.
submitted by: Istvan Albert
Genotype prediction of 336,463 samples from public expression data | bioRxiv (www.biorxiv.org)
Here, we developed a statistical model based on the existing reference and alternative read counts from the RNA-seq experiments available through Recount3 to predict genotypes at autosomal biallelic loci in coding regions. We demonstrate the accuracy of our model using large-scale studies that measured both gene expression and genotype genome-wide. We show that our predictive model is highly accurate with 99.5% overall accuracy, 99.6% major allele accuracy, and 90.4% minor allele accuracy. Our model is robust to tissue and study effects, provided the coverage is high enough.
submitted by: Istvan Albert
CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure | Genome Biology | Full Text (genomebiology.biomedcentral.com)
HESS 3 represents an improved human gene catalog based on nearly 10,000 RNA-seq experiments across 54 body sites. It significantly improves current genome annotation by integrating the latest reference data and algorithms, machine learning techniques for noise filtering, and new protein structure prediction methods. CHESS 3 contains 41,356 genes, including 19,839 protein-coding genes and 158,377 transcripts, with 14,863 protein-coding transcripts not in other catalogs. It includes all MANE transcripts and at least one transcript for most RefSeq and GENCODE genes. On the CHM13 human genome, the CHESS 3 catalog contains an additional 129 protein-coding genes. CHESS 3 is available at http://ccb.jhu.edu/chess.
submitted by: Istvan Albert
Back to sequences: find the origin of kmers | bioRxiv (www.biorxiv.org)
A vast majority of bioinformatics tools dedicated to the treatment of raw sequencing data heavily use the concept of kmers. This enables us to reduce the data redundancy (and thus the memory pressure), to discard sequencing errors, and to dispose of objects of fixed size that can be manipulated and easily compared to others. A drawback is that the link between each kmer and the original set of sequences it belongs to is generally lost. Given the volume of data considered in this context, finding back this association is costly. In this work, we present ''back_to_sequences'', a simple tool designed to index a set of kmers of interests, and to stream a set of sequences, extracting those containing at least one of the indexed kmer. In addition, the number of occurrences of kmers in the sequences is provided. Our results show that back_to_sequences streams ~200 short read per millisecond, enabling to search kmers in hundreds of millions of reads in a matter of a few minutes.
submitted by: Istvan Albert
Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription