hello everyone
I am working with some bam files, and trying to retrieve a fasta sequence from them.
I have sorted my workflow, but i have an inquiry since the size of the files doesn't seem to match, i describe my issue as follows:
My original bam file has 2Gb size, when i sort it in order to retrieve a sequence with the "samtools view" tool, like this:
samtools sort inputfile.bam -o inputfile.sort.bam
the resulting file is 390Kb in size.
Is that normal? , i have checked a region of the genome with the "less" argument, and it does have information, still, i am hesitating on wheather the sorted file is truncated, since i think it should measure 2Gb as the original file.
Anyone with any idea what may be happening, or if my info is to be relied on ?
Hello ricfoz,
that the file size is smaller after sorting is normal, as the compression works better on sorted data. But the difference is to huge. Are you sure that you
inputfile.bam
is really abam
file or is it asam
file? TryIf you get something human readable this is the
sam
file and could explain why this file is much larger.fin swimmer