I am getting a piles of fastq files with either generic (R12345.r1.fq) or plainly confusing (170811p1pt.r1.fq).
While storing md5, project name etc. helps a bit, I feel that without a massive scale rename I will not be able to make a sense of the results, or even get the results in the first place.
Do you guys require that such files have some labels (dna, rna, net) followed if needed by say wgs, exo, tg1 etc? Wet lab ppl are multiplexing samples and dumping sequencing folders with rather spartan Excel metadata. No LIMS, no consistent naming schemes.
I am renaming everything to stay sane (keeping CSV files with old_name, new_name, flowcell, machine_id, number of reads, run_date,).
I will be greatful for the suggestions how to improve it. CSV -->> DB with a frontend is obvious.
A naming scheme that would work universally is difficult to implement. If you deal with tens of thousands of samples for a large consortium project then short of a LIMS/DB nothing will work.
One of the issues we deal with in a core facility is people naming their samples
Samaple_101, Sample_201
etc. While it makes perfect sense for them (a code if you will) it obviously causes issues on core end. A unique identifier that is automatically generated (that does not need to be human readable) is one way of avoiding this issue. Translation of the names can also be done on the fly (store the file with any name you want) your users will see the name they are familiar with on front end. This would only work if they are accessing results you produce indirectly (via a portal for example).If more than just you needs to access/work on the data then implementing a proper tracking system would pay dividends in long term. Even after you leave.