The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.
This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan Albert,
GitHub - hasindu2008/slow5lib: slow5lib is a software library for reading & writing SLOW5 files. (github.com)
SLOW5 is a new file format for storing signal data from Oxford Nanopore Technologies (ONT) devices. SLOW5 was developed to overcome inherent limitations in the standard FAST5 signal data format. SLOW5 can be encoded in human-readable ASCII format, or a more compact and efficient binary format (BLOW5).
Gothca, from FAST5 we go to SLOW5 and BLOW5 no seriously - is this madness? no this is Bioinformaaaatics.
submitted by: Istvan Albert
scReadSim: a single-cell RNA-seq and ATAC-seq read simulator | Nature Communications (www.nature.com)
We introduce scReadSim, a single-cell RNA-seq and ATAC-seq read simulator that allows user-specified ground truths and generates synthetic sequencing reads (in a FASTQ or BAM file) by mimicking real data. At both read-sequence and read-count levels, scReadSim mimics real scRNA-seq and scATAC-seq data. Moreover, scReadSim provides ground truths, including unique molecular identifier (UMI) counts for scRNA-seq and open chromatin regions for scATAC-seq.
submitted by: Istvan Albert
Centrifuger: lossless compression of microbial genomes for efficient and accurate metagenomic sequence classification | bioRxiv (www.biorxiv.org)
Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database.
submitted by: Istvan Albert
GitHub - mourisl/centrifuger: Classifier for metagenomic sequences using FM-index with run-block compressed BWT. (github.com)
Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. It implemented a novel lossless compression method, run-block comprssed BWT, and other strategies to efficiently reduce the size of the microbial genome database like RefSeq. For example, Centrifuger can classify reads against the 2023 RefSeq prokaryotic genomes containing about 140G nucleotides using 43 GB memory. Despite running on a compressed data structure, Centrifuger is also highly efficient and can process a typical sequencing sample within an hour.
submitted by: Istvan Albert
Let's say you want to publish in a top-tier journal and need to have a high accuracy predictor of something of great medical importance, such as survival of cancer patients in response to immunotherapy. The easiest route is just to cheat: https://t.co/Yu2sw6ITUw(X,y).predict(X) pic.twitter.com/8np8YPCyGy
— alex rubinsteyn (@iskander) November 17, 2023
Let's say you want to publish in a top-tier journal and need to have a high accuracy predictor of something of great medical importance, such as survival of cancer patients in response to immunotherapy. The easiest route is just to cheat: https://t.co/Yu2sw6ITUw(X,y).predict(X) pic.twitter.com/8np8YPCyGy
— alex rubinsteyn (@iskander) November 17, 2023submitted by: Istvan Albert
Nobody has any idea how to pay for the journal system because it’s grotesque that we pay $10b a year for a system whose primary effect is to delay the communication of new scientific findings in the name a dysfunctional quality assurance system. https://t.co/cC3VPfrgtS
— Michael Eisen (@mbeisen) November 15, 2023
Nobody has any idea how to pay for the journal system because it’s grotesque that we pay $10b a year for a system whose primary effect is to delay the communication of new scientific findings in the name a dysfunctional quality assurance system. https://t.co/cC3VPfrgtS
— Michael Eisen (@mbeisen) November 15, 2023submitted by: Istvan Albert
GitHub - vcflib/vcflib: C++ library and cmdline tools for parsing and manipulating VCF files with python and zig bindings (github.com)
This is vcflib's first Humpty Dumpty release: vcfcreatemulti is the natural companion to vcfwave. Often variant callers are not perfect. vcfwave with its companion tool vcfcreatemulti can take an existing VCF file that contains multiple complex overlapping and even nested alleles and, unlike Humpty Dumpty, take them apart and put them together again. Thereby, hopefully, creating sane VCF output that is useful for analysis and getting rid of false positives.
submitted by: Istvan Albert
A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar | PLOS Computational Biology (journals.plos.org)
Here we present a spectrum of over 125 useful, complimentary free and open source software tools and libraries, we wrote and made available through the multiple vcflib, bio-vcf, cyvcf2, hts-nim and slivar projects. These tools are applied for comparison, filtering, normalisation, smoothing and annotation of VCF, as well as output of statistics, visualisation, and transformations of files variants. These tools run everyday in critical biomedical pipelines and countless shell scripts. Our tools are part of the wider bioinformatics ecosystem and we highlight best practices.
submitted by: Istvan Albert
Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription