News:samtools/bcftools/htslib 1.0 released
1
6
Entering edit mode
10.3 years ago
rtliu ★ 2.2k

[Samtools-announce] samtools/bcftools/htslib 1.0 released
From: John Marshall jm18@sa... - 2014-08-15 17:58:00

Major new versions of samtools and associated tools have been released. Significant changes include:

  • support for CRAM, a sequence data file format with better compression than BAM
  • support for BCF v2, a binary equivalent to the VCF format that is more flexible than the old BCF1
  • bcftools has been split out from samtools as a fully-featured VCF/BCF manipulation toolkit
  • much work on the variant calling algorithms
  • replacement of samtools's SAM/BAM readers and writers with new implementations in htslib
  • a more flexible API for writing programs using htslib's API rather than the old samtools bam.h (which will remain supported and usable for a while yet)

BCFtools has been significantly expanded to provide many tools for basic manipulation of VCF and BCF files. Both SAMtools and BCFtools now use HTSlib for their low-level SAM/BAM/CRAM and VCF/BCF file operations. Due to these reorganisations there are many small changes and improvements -- notably most tools automatically detect what kind of input files they have been provided with.

These releases can be found at http://www.htslib.org/. Development and source code of these projects are linked from that web site and can be found in various Git repositories under https://github.com/samtools. The web site at http://samtools.sourceforge.net contains information for the old 0.1.x samtools/bcftools releases which is still useful but needs some updating for the 1.0 release.

The http://www.htslib.org website will be updated over the coming days and weeks with new workflows and updated information from the old website. It has been a long time since the last samtools release -- we anticipate making more frequent samtools, bcftools, and htslib releases in future.

John

http://sourceforge.net/p/samtools/mailman/message/32723301/

htslib samtools bcftools • 5.8k views
ADD COMMENT
1
Entering edit mode

The press release from the Sanger Institute may clarify some of confusion:

Samtools CRAMS in support for improved compression formats

Samtools CRAMS in support for improved compression formats Key upgrade to genomics software will underpin global data sharing

Samtools 1.0 is freely available at http://www.htslib.org/. This new version supports the highly efficient genomic data format CRAM, adds new functionality, and integrates more cleanly with other tools.

Computer scientists at the Wellcome Trust Sanger Institute have released a major upgrade of Samtools, one of the most popular next-generation sequence analysis tools. The revised Samtools 1.0 enables researchers to easily compress, share and analyse genomic sequence data, reducing costs and supporting genomics research around the world.

The Global Alliance for Genomics and Health, in which the Sanger Institute is a partner, has been set up to enable researchers and clinicians to work together using standardised and efficient DNA sequence data formats to find the genetic variants responsible for disease. Samtools 1.0 supports this initiative by enabling researchers to read and write data in the new CRAM format, which was recently adopted by the Global Alliance, in addition to the existing SAM and BAM file formats for genomic sequence information.

The benefits of using CRAM are immediate: it gives a size reduction of 10-30 per cent. In addition, in a similar fashion to the JPEG format for images, CRAM supports much greater compression - up to a hundred fold - in 'lossy' mode which preserves almost all of the important information.

"This major rebuild of Samtools reflects our commitment to supporting the global use of sequencing data," says Dr Richard Durbin, Head of Computational Genomics at the Sanger Institute. "Genome science worldwide relies on fast and efficient data analysis and storage, and Samtools 1.0 fulfils this need by supporting new sequencing and analysis technologies."

"Genome science worldwide relies on fast and efficient data analysis and storage, and Samtools 1.0 fulfills this need by supporting new sequencing and analysis technologies "
  

Dr Richard Durbin

Samtools software is embedded in many bioinformatics pipelines and is the foundation of many thousands of genomic research papers. Since its creation in 2009, the program has been downloaded more than 225,000 times. Samtools 1.0 is freely available at http://www.htslib.org/. This new version was substantially rewritten to support the highly efficient genomic data format CRAM, add new functionality, and integrate more cleanly with other tools.

"Samtools 1.0 embeds CRAM into genomic data analysis pipelines and removes the need for additional processing," says John Marshall, from the Sanger Institute. "This development paves the way for widespread uptake of this highly efficient file format in genomic research and will lead to lower storage costs."

The significant savings in storage that can be achieved are due to incorporating data compression techniques developed jointly by the Sanger Institute and the EMBL-European Bioinformatics Institute.

"It has been exciting to work on implementing CRAM into Samtools," says James Bonfield, at the Sanger Institute. "The great flexibility of CRAM has allowed a number of new compression techniques to be incorporated, which when combined with Samtools 1.0 will help to future-proof genomic data storage and analysis."

also this poster Future development of the Samtools software package - http://samtools.sourceforge.net/Samtools-GenomeInformatics2013.pdf

ADD REPLY
3
Entering edit mode

Well that clarifies some aspects for those that happen to read the press release. But that is probably a very small subset of those that use samtools.

There should be single page for samtools, preferably having the word samtools in its domain name. Then there should be a single place for documentation where it is clearly marked that there are other, older versions of samtools.

IMO usability and accessibility are more important and affect science more profoundly than adding new features.

ADD REPLY
12
Entering edit mode
10.3 years ago

I am bit concerned that the Samtools project seems to become more and more confusing as it gets fragmented into smaller and more disparately named pieces both in terms of functionality and documentation. It is getting increasingly difficult to sort out what is what.

For example visiting the http://samtools.sourceforge.net/ page the top link sends the user to http://www.htslib.org/ where the page is titled Samtools(!) and appears to be the main documentation page for Samtools. But from what I understood htslib is a sequence processing C library with no standalone functionality. It is a library that is supposed to be embedded in other tools. Why would this be the main site to distribute samtools and bcftools related information from?

Now github has an organization called samtools https://github.com/samtools that links to http://samtools.sourceforge.net/ as their homepage, a page that (as explained above) says that http://www.htslib.org/ is the actual homepage.

This organization called samtools has a product named samtools/samtools under the url https://github.com/samtools/samtools but one can't just pull and compile it, the samtools/htslib repository https://github.com/samtools/htslib also needs to be at the same directory level as samtools. (why does this need to be a manual process isn't the git submodule designed to cover embedding one project's code into the other?)

Then there is a repository called samtools/samtools.github.com stored at https://github.com/samtools/samtools.github.com that seems to contain the html source of the page formatted very similarly as at http://samtools.sourceforge.net/ This repository seems to exist as a webpage when one visits http://samtools.github.io/ (note the change of domain relative to the name of the repository) and it will produce a page where the most important link Manual Page does not work. The content on this page is far more complete yet it seems to be in direct competition with the pages at http://www.htslib.org/ yet it does not indicate or link to that.

The downloads are still distributed from sourceforge.

The potential to cause confusion leading to large amounts of wasted time and effort is quite high.

ADD COMMENT
0
Entering edit mode

Hopefully you don't have anything dependent on the C API. That's changed and fragmented as well (and is undocumented at the moment).

ADD REPLY
0
Entering edit mode

Indeed very confusing. I can't compile it. It's complaining about missing htslib dir and suggesting to configure with --with-htslib=DIR but me providing the DIR does nothing. Same error message no matter what.

ADD REPLY
0
Entering edit mode

Are you using 1.0 or 1.3?

ADD REPLY
0
Entering edit mode

I was referring to the samtools git. Ended up installing 1.3 from http://www.htslib.org/download/

ADD REPLY
0
Entering edit mode

Yeah, with git you would need to initialize the htslib submodule.

ADD REPLY
0
Entering edit mode

A chance to learn something.. what do you mean by initialization of the htslib submodule?

ADD REPLY
0
Entering edit mode

The general steps to get samtools to compile when using git are:

git clone https://github.com/samtools/samtools.git
git clone https://github.com/samtools/htslib.git
cd samtools
autoconf
make install prefix=/some/path

I'd forgotten that htslib is no longer a submodule. With previous versions, you'd git submodule update --init after changing into the samtools directory and not need to clone htslib (or run autoconf).

ADD REPLY

Login before adding your answer.

Traffic: 2192 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6