While downloading some raw WGBS data from the Roadmap Epigenomics project, I noticed that multiple SRA Runs were associated with each Experiment.
Reading the FAQs here, I was under the impression that only 1 Run can be associated with each Experiment. I saw that the Experiments only differed in the "Bases" and "Bytes" columns.
For example, BioSample SAMN00857854 (GEO Accession: GSM916051) was sequenced with Illumina HiSeq 200 and has one Experiment SRX142783 with the following SRA Runs:
Run : ['SRR1143696' , 'SRR1143697' , 'SRR1143700' , 'SRR1143702' , 'SRR1143704']
which correspond to:
Bases : [48905968410 , 49852911810 , 34303272200 , 18904063200 , 34950365000]
Bytes : [33485451056 , 32870075947 , 24289423164 , 13323536868 , 24641285065]
as the only metadata values that differ across Runs.
How do I handle these Runs? Is the correct way to:
- Concatenate FastQ files from multiple Runs into one file before preprocessing? OR
- Preprocess FastQ files from each Run separately and treat as technical replicates?