The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.
This edition of the Herald was brought to you by contribution from Mensur Dlakic, Istvan Albert, and was edited by Istvan Albert,
Haplotype-resolved assembly of diploid genomes without parental data | Nature Biotechnology (www.nature.com)
Routine haplotype-resolved genome assembly from single samples remains an unresolved problem. Here we describe an algorithm that combines PacBio HiFi reads and Hi-C chromatin interaction data to produce a haplotype-resolved assembly without the sequencing of parents. Applied to human and other vertebrate samples, our algorithm consistently outperforms existing single-sample assembly pipelines and generates assemblies of similar quality to the best pedigree-based assemblies.
submitted by: Istvan Albert
Completing the human genome (www.science.org)
The Telomere-to-Telomere (T2T) Consortium has completed a challenging 8% of the human genome left unresolved by the initial Human Genome Project.
22 years after the publication titled The Sequence of the Human Genome it seems that the human genome sequence has been fully completed
Links:
- Sequence data: https://github.com/marbl/CHM13
- Code used for stitching: https://github.com/snurk/sg_sandbox
submitted by: Istvan Albert
AI predicts the effectiveness and evolution of gene promoter sequences (www.nature.com)
A long-standing goal of biology is the ability to predict gene expression from DNA sequence. A type of artificial intelligence known as a neural network, combined with high-throughput experiments, now brings this goal a step closer.
submitted by: Istvan Albert
Vcfanno: fast, flexible annotation of genetic variants | Genome Biology | Full Text (genomebiology.biomedcentral.com)
The integration of genome annotations is critical to the identification of genetic variants that are relevant to studies of disease or other traits. However, comprehensive variant annotation with diverse file formats is difficult with existing methods. Here we describe vcfanno, which flexibly extracts and summarizes attributes from multiple annotation files and integrates the annotations within the INFO column of the original VCF file.
submitted by: Istvan Albert
ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2 | bioRxiv (www.biorxiv.org)
Alternative transcript visualization with ggplot2
submitted by: Mensur Dlakic
GitHub - whatshap/whatshap: Read-based phasing of genomic variants, also called haplotype assembly (github.com)
WhatsHap is a software for phasing genomic variants using DNA sequencing reads, also called read-based phasing or haplotype assembly. It is especially suitable for long reads, but works also well with short reads.
submitted by: Istvan Albert
Accurate assembly of multi-end RNA-seq data with Scallop2 | Nature Computational Science (www.nature.com)
Here we introduce Scallop2, a reference-based assembler optimized for multi-end RNA-seq data
Tested on 561 cells in two Smart-seq3 datasets and on ten Illumina paired-end RNA-seq samples, Scallop2 substantially improves the assembly accuracy compared with two popular assemblers (StringTie2 and Scallop).
submitted by: Istvan Albert
Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription