Difference between gam and sam/bam files?
1
1
Entering edit mode
8 weeks ago
Uveyik ▴ 50

Hi everyone, I am trying to understand pangenomic data/file formats. In the vg's descripton of file format, it says gam is similar to sam/bam and usually in binary(comressed) form. Is there a difference between sam and gam in terms of format and are gam files always compressed? If not, how can we know if it binary or not?

bam pangenome gam sam • 352 views
ADD COMMENT
2
Entering edit mode
8 weeks ago
LauferVA 4.5k

Hi everyone

Hi!

I am trying to understand pangenomic data/file formats. In the vg's descripton of file format, it says gam is similar to sam/bam and usually in binary (compressed) form. Is there a difference between sam and gam in terms of format?

yes. a .sam or .bam is a linear alignment map. a .gam is a graph-based alignment map. this accounts for the differences in the structure/organization of the file; that is, .gam files contain additional fields that describe the paths a read takes through the pangenome graph, including nodes visited and edge traversals - these extra fields allow the GAM to store complex paths that are non-linear in structure.

gam files always compressed?

a .bam is a binary sam file; a .sam file should be plain text. by contrast, a .gam is usually binary, but can be converted to a readable format (like json). there are also further compressions of .bam files. suppose you are running a clinical sequencing operation, and you need to keep hundreds of. (very large) .bam files in storage long-term. in this case, you may want to try to further compress them. there are CRAM files and other compression methods that are used for this.

If not, how can we know if it binary or not?

there are lots of ways. probably the simplest thing is something like cat myfile.gam. if that outputs nonsense, its binary, but this isn't the most exact method. it's better, if youre in a linux environment, to use something like:

file mygile.gam

which will indicate either that its a "data" file (binary) or "ASCII text". alternatively, you can use vg itself; something like vg view -a myfile.gam will output the file to a json serialization

hope it helps!

ADD COMMENT
0
Entering edit mode

Thanks a lot, it is really helpful. Actually, I am trying to know if it is binary inside C++ code, so which method would be better to use inside the code. Can I use the same parsing in the code as in the sam format?

ADD REPLY

Login before adding your answer.

Traffic: 1377 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6