Hi all,
I am new to FASTA FASTQ SAM BAM and related explorations.
I am learning on the go, I apologies for any lack of substance.
I am working on a human read, my whole human genome sequencing, downloaded on the service provider's website.
What I have is a:
- BAM file gz compressed
- BAI file gz compressed
- FASTQ R1 file gz compressed
- FASTQ R2 file gz compressed
To speed up things I decompressed all the fles, this made me run through the truncated EOF error on samtools. I dont have any error when I use the *.gz files.
Is there a way to avoid that? I tried to manually force the EOF, but I still get the warning and errors using VIEW samtools command, essential command.
But what is puzzling me at the moment is the CPU usage of samtools jobs. If I use the *.gz files, 25% of each core is used , if I use the uncompressed files, 2 to 5 % of the core is used (I tried the -@ INT flag, nothing changes).
Is that normal?
As an example, when I run the command:
- samtools index -@ INT file.bam.gz >>>>> 25%
- samtools index -@ INT file.bam >>>>>> 2-5 %
Many thanks to all :)
Thank you Pierre,
Thank you for the random access precision, I'll keep it in mind next time I compress a Jay-z flow.
What about the low core usage and EOF warnings//errors?
For whom concerned and substance,