Is there a format to describe sample names and their associated flowcell(s), lane(s) and barcode(s) from Illumina sequencing experiments?
The Illumina documentation describes the following notation for multiplex and non-multiplexed runs:
Naming
Illumina FASTQ files use the following naming scheme:
<sample name>_<barcode sequence>_L<lane (0-padded to 3 digits)>_R<read number>_<set number (0-padded to 3 digits>.fastq.gz
For example, the following is a valid FASTQ file name:
NA10831_ATCACG_L002_R1_001.fastq.gz
In the case of non-multiplexed runs, <sample name> will be replaced with the lane numbers (lane1, lane2, ..., lane8) and <barcode sequence> will be replaced with "NoIndex".
And I have seen bcbio has some code and example yaml files to describe some of this, and it seems scilifelab has adopted it:
http://bcbio-nextgen.readthedocs.io/en/latest/contents/configuration.html https://github.com/SciLifeLab/scilifelab/blob/e5f4be45e2e9ff6c0756be46ad34dfb7d20a4b4a/scilifelab/bcbio/flowcell.py
What I am looking for is a standard or something close to a standard that people have adopted for this.
Does anything like this exist? Is Common Workflow Language CWL dealing with this? Galaxy? Genologics?