NextGen Sequencing Folks, what do you keep as your raw backup? BCL's or FASTQ's? Both?
3
3
Entering edit mode
2.7 years ago
msn ▴ 130

Title pretty much sums it up. Right now I am keeping both. FASTQ's because if I want to come back to something in the raw counts its easy to do so. My idea for keeping the BCL's on the backup drives is that in the future if newer or better algorithms come out for generating the fastqs then it can be re-run by someone in the future to look at something different. I mean the amount of times I come across a count matrix and wish I had their raw reads to intro/exon analysis , velocity, or de novo assembly on V, J, C genes but the person who did the experiment at the time had no idea in the future that would be useful to anyone etc.

But at the same time. I am running out of disc space, and running to microcenter to keep buying drives and more NAS' enclosures is going to become more and more work to maintain them. I feel like I can cut my storage needs dramatically if I only keep the fastqs or the BCLs and not both. The question is, should I be?

Thanks for your insights and help!

next-gen-sequencing backups data-storage • 1.8k views
ADD COMMENT
0
Entering edit mode
2.7 years ago

We have been keeping the BCLs and the fastqs, though I have always argued against it. If we have a barcoding error, it should be discovered quickly, and the fastqs recalled from bcl.

I have never once in practice seen data recalled from BCL more than two weeks after the sequencing has been completed. So after a month, if you're happy with initial analyses, you can delete the BCL in my estimation.

FAST5 from nanopore is a different story, those absolutely must be kept since the algorithms are improving all the time.

ADD COMMENT
0
Entering edit mode

Thanks for that insight!

ADD REPLY
0
Entering edit mode
2.7 years ago
GenoMax 147k

If you are a sequence provider then you should consider keeping a backup of original run data folder along with fastq files for a stated period of time. This will ensure that you will be able to reprocess the data, if that was ever needed. As colindaven points out this period can be as short as you feel comfortable with or your budget allows for. At a minimum keep fastq files readily available (again for a stated period that is visible to users) since 99% of the times you will get requests for just sequence files.

If you are an end-user then consider submitting fastq data to SRA/ENA. You can ask for the data to be embargoed while you work on publication so it is not immediately publicly accessible. This ensures that there will be a copy of the data available, if needed, forever.

ADD COMMENT
0
Entering edit mode

I wish our sequence provider would keep a copy for us. They only keep the data for 3 months sadly and with my backlog of analysis and experiments I might not be able to complete the QC on data set in that time if I am working on 4 more at the same time with results required for a paper or grant needed ASAP. Academia am I right? Hence keeping them myself.

SRA is the plan once we publish a set. If fastq's are enough to for "raw" then I will happily delete the BCL's and free up space on the NAS for the next sequencing results.

Thank you for your additional input and advice.

ADD REPLY
0
Entering edit mode
2.7 years ago
Jesse ▴ 850

My idea for keeping the BCL's on the backup drives is that in the future if newer or better algorithms come out for generating the fastqs then it can be re-run by someone in the future to look at something different

Keep in mind that the BCL format just stores already-defined base calls and quality scores, not anything more raw than that. The conversion of BCL to FASTQ is really just a file format conversion. That said, since Illumina blends the file conversion with a few other tasks (like demultiplexing and the optional index read creation) in bcl2fastq, keeping the BCL files can still be prudent-- just not in terms of base calling (if that's what you were thinking).

For more information on the format see this document, particularly page 7:

https://support.illumina.com/content/dam/illumina-support/documents/documentation/software_documentation/bcl2fastq/bcl2fastq2-v2-20-software-guide-15051736-03.pdf

ADD COMMENT

Login before adding your answer.

Traffic: 1609 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6