The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.
This edition of the Herald was brought to you by contribution from Mensur Dlakic, Istvan Albert, and was edited by Ram, Istvan Albert,
🧬Exciting Opportunity! The Notice of Special Interest (NOSI) calls for projects to optimize data storage and utilization for the Sequence Read Archive (SRA). 🚀 Propose strategies to reduce storage costs and enhance data efficiency. Learn more: https://t.co/9hK5ELq4tm #SRA pic.twitter.com/foqFrUWX0x
— NIH Office of Data Science Strategy (@NIHDataScience) December 15, 2023
🧬Exciting Opportunity! The Notice of Special Interest (NOSI) calls for projects to optimize data storage and utilization for the Sequence Read Archive (SRA). 🚀 Propose strategies to reduce storage costs and enhance data efficiency. Learn more: https://t.co/9hK5ELq4tm #SRA pic.twitter.com/foqFrUWX0x
— NIH Office of Data Science Strategy (@NIHDataScience) December 15, 2023Here is my application:
Generate tar gzipped archives and distribute it via BitTorrent. Publish an SQLITE database with querieable metadata that connects information to BitTorrent seed. Call it "The FASTQ Bay".
submitted by: Istvan Albert
Variant Graph Craft (VGC): A Comprehensive Tool for Analyzing Genetic Variation and Identifying Disease-Causing Variants. | bioRxiv (www.biorxiv.org)
We introduce Variant Graph Craft (VGC), a VCF file visualization and analysis tool offering a wide range of features for exploring genetic variations, including extraction of variant data, intuitive visualization of variants, and the provision of a graphical representation of samples, complete with genotype information. Furthermore, VGC seamlessly integrates with external resources to offer valuable insights into gene function and variant frequencies in sample data.
submitted by: Istvan Albert
The impact of PCR duplication on RNA-seq data generated using NovaSeq 6000, NovaSeq X, AVITI and G4 sequencers. | bioRxiv (www.biorxiv.org)
In this study, we investigate the impact of input amount and PCR cycle number on the PCR duplication rate and on the RNA-seq data quality using a broad range of inputs (1 ng - 1,000 ng) for RNA-seq library preparation with unique molecular identifiers (UMIs) and sequencing the data on four different short-read sequencing platforms: Illumina NovaSeq 6000, Illumina NovaSeq X, Element Biosciences AVITI, and Singular Genomics G4. Across all platforms, samples of input amounts greater than 125 ng had a negligible PCR duplication rate and the number of PCR cycles did not have a significant effect on data quality. However, for input amounts lower than 125ng we observed a strong negative correlation between input amount and the proportion of PCR duplicates; between 34% and 96% of reads were discarded via deduplication.
submitted by: Istvan Albert
https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad1167/7460324
Comprehensive simulations and test data show that an edgeR analysis of the scaled counts is more powerful and efficient than previous differential transcript expression pipelines while providing correct control of the false discovery rate. Simulations explore a wide range of scenarios including the effects of paired vs single-end reads, different read lengths and different numbers of replicates.
submitted by: Istvan Albert
Surge in number of ‘extremely productive’ authors concerns scientists (www.nature.com)
Some researchers publish a new paper every five days, on average. Data trackers suspect not all their manuscripts were produced through honest labour.
Oh really?
submitted by: Istvan Albert
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad726/7460205?login=false
Longer reads produced by PacBio or Oxford Nanopore sequencers could more frequently span the breakpoints of structural variants (SVs) than shorter reads. Therefore, existing long-read mapping methods often generate wrong alignments and variant calls. Compared to deletions and insertions, inversion events are more difficult to be detected since the anchors in inversion regions are nonlinear to those in SV-free regions. To address this issue, this study presents a novel long-read mapping algorithm (named as invMap).
submitted by: Istvan Albert
GOAT: efficient and robust identification of geneset enrichment | bioRxiv (www.biorxiv.org)
Geneset enrichment analysis is foundational to the interpretation of high throughput biology. We here present GOAT (https://github.com/ftwkoopmans/goat), a parameter-free permutation-based algorithm for geneset enrichment analysis of preranked genelists. Estimated geneset p-values are well calibrated under the null hypothesis and invariant to geneset size. GOAT consistently identifies more Gene Ontology terms in real-world datasets than current methods and is available as an R package and online tool.
submitted by: Istvan Albert
A framework for quantifiable local and global structure preservation in single-cell dimensionality reduction | bioRxiv (www.biorxiv.org)
Yet another dimensionality reduction for scRNA
submitted by: Mensur Dlakic
Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription
the digest continues to post great relevant stuff, thanks Istvan :)