Entering edit mode
3.5 years ago
James Reeve
▴
130
I'm apply for computing resources on a cluster that asks for an estimate of storage space. So far I only have compressed fastq files, making it hard to judge how much total space I will need. Is there any rule of thumb for estimating the total size of the BAM/SAM files based on the size of the fastq?
Within a ball park. It will depend to a large extent on amount of secondary alignments and/or if you are going to keep unmapped fastq reads in the BAM file.
I guess it would depend on a lot of factors. For my purposes I won't be keeping either secondary alignments or unmapped reads. Hopefully, this helps a bit in coming up with a rough estimate of storage.
'Machine learning' could sample reads from a FASTQ file, and then predict the size of the aligned BAM file, for sure