Question

Ngs Reads Alignment

3

Entering edit mode

12.9 years ago

Leszek 4.2k

Do you know of any initiatives for NGS alignments compression?

BAM format offers compression, but still all aligned sequences and their qualities are stored. Do you know of any reference based compression? I think people at ENA are working in that matter. Have a look at CRAM.

What is you opinion about keeping qualities? Maybe using some quality thresholds is reasonable? Or storing qualities only for mismatches (and maybe for +- 3 bases)?

And what about sequence headers? Do we need to keep this at all in the alignment? Storing pair-end information should be enough in my opinion.

I'm really interested in your opinions:)

next-gen sequencing alignment bam snp • 3.2k views

ADD COMMENT • link updated 12.9 years ago by fac2003 • 0 • written 12.9 years ago by Leszek 4.2k

2

Entering edit mode

If it was just about saving space you could get rid of the FASTQ input data, after you stored all sequences and qualities in a BAM file. FASTQ can then be generated from the BAM file. It is a matter of compromise when discarding data. In this case discarding quality and sequence compromises re-analysis. Btw, your question is about compression, but also about discarding data != compression.

ADD REPLY • link 12.9 years ago by Michael 55k

0

Entering edit mode

yes, I'm curious what are your opinions about lossless compression vs compression discarding some data (like sequence headers, some quals, etc)

ADD REPLY • link 12.9 years ago by Leszek 4.2k

score 5 · Answer 1 · 2011-12-29

5

Entering edit mode

12.9 years ago

Madelaine Gogol 5.3k

There's an interesting discussion here: https://plus.google.com/107526144078068918726/posts/ZCBc8DH3yKK

ADD COMMENT • link 12.9 years ago by Madelaine Gogol 5.3k

score 1 · Answer 2 · 2011-12-29

Here is a paper describing a scheme for comparative compression of genomes from the same species. You might want to figure out first why you want to compress your data. If you can generate some of these metrics such as quality scores on the fly, then try compressing data after the last step that is computationally intensive.

score 0 · Answer 3 · 2012-06-15

Goby 2.0 is a major milestone for the Goby project which brings state of the art NGS alignment compression as well as very robust SAM/BAM import exports. See the new tutorial ‘What’s new in Goby 2.0‘ for more information.

We created a summary table to compare features of Goby 1.x, 2.0, BAM, CRAM and FASTQ. Click here to see the full table. enter image description here