Calculate sequencing file size?

0

Entering edit mode

2.4 years ago

Lluís R. ★ 1.2k

We are writing the data management plan and I need to estimate the size of the data we'll get (for storage costs) per sample (as a fastq.gz).

I found a very rough estimation in this comment which is what I know from previous experience, but I wonder if there are more accurate calculators available?

Some variables that I think are important:

Read length
Paired
Targeted number of reads
Adaptors
UMI
Technology (RRBS, WGS, RNAseq, 16S, metagenomics, metaproteomics)

storage data sequencing plan management • 1.6k views

ADD COMMENT • link updated 2.4 years ago by GenoMax 151k • written 2.4 years ago by Lluís R. ★ 1.2k

1

Entering edit mode

This is not something you can be absolute about unless you have a large amount of data to refer back to. The sizes are going to vary depending on library quality/sample type/compress-ability etc.

If you must come up with a number then look at some representative files and add 10% to cover outliers.

Remind the authorities that the more successful you are the more you are going to need (storage that is).

ADD REPLY • link 2.4 years ago by GenoMax 151k

0

Entering edit mode

Oh I absolutely agree, I don't need a hard estimate (between 1 and 3 GB) but it would be nice to have some numbers to more or less know if it is better to pay cloud storage or just have local copies.

ADD REPLY • link 2.4 years ago by Lluís R. ★ 1.2k

1

Entering edit mode

If you don't have data of your own you could use, then randomly get examples from SRA and go from there. You could also use read simulators and create dummy datasets with different characteristics that you control.