Hi,
Any idea if there's a way to get the Instrument type (i.e. NextSeq, HiSeq, etc), from the instrument name field in the Fastq read header? - I've tried looking for a map of sorts in Illumina's documentation, but to no avail.
Thanks,
Hi,
Any idea if there's a way to get the Instrument type (i.e. NextSeq, HiSeq, etc), from the instrument name field in the Fastq read header? - I've tried looking for a map of sorts in Illumina's documentation, but to no avail.
Thanks,
There was a post @SeqAnswers where we had a recap of the types of sequencer specific fastq headers (my google-fu has not turned up that post yet). Following is a very rough approximation of start of fastq headers that are sequencer specific.
Edit: Just want to add that it is possible to change the default read header in one of the set up files (so the following may not always be true).
@HWI-Mxxxx or @Mxxxx - MiSeq
@HWUSI - GAIIx
@HWI-Dxxxx - HiSeq 2000/2500
@Kxxxx - HiSeq 3000(?)/4000
@Nxxxx - NextSeq 500/550
@Axxxxx - NovaSeq
@Vxxxxx = NextSeq 2000
@AAxxxxx - NextSeq 2000 P1/P2/P3
Edit (12/2024)
_2225* - NextSeq P4
@Hxxxxxx - NovaSeq S1/S2/S4
I don't know if those are still the default headers. Our new MiSeq outputs '@Mxxx' headers so, unless the FAS reconfigured it upon installation, it's the current default. Similarly, our HiSeq headers are not prefaced with '@HISEQ'.
The FCIDs are generated automatically during the flow cell scan, which is why I recommended using those to unambiguously distinguish the instruments. But I have not yet been able to find the complete code for conversion. I'll continue digging.
That is why I added a disclaimer.
It is possible to reconfigure the default headers (we have several HiSeq/MiSeq and the headers are slightly different). I have not looked through the config files to see where exactly this is set.
I remember that the FAS's have access to a special illumina tool/database where they can lookup kit/flowcell ID's but I don't think that tool is available to us.
Do you mean FCID = Name of the data folder
?
That generally has the format
For HiSeq
YYMMDD_InstrumentSerialNumber_RunNumber*_[A/B]FlowCellBarcode
For MiSeq
YYMMDD_InstrumentSerialNumber_RunNumber*_000000000-FlowCellBarcode
YYMMDD - Date the run started
*RunNumber - Cumulative run number (incremented by 1 for each run) on that sequencer.
A/B - signifies the flowcell position (in case of instruments that can run 2 FC at a time).
These names are configurable in a .cfg settings file and will give one an idea of the kind of sequencer it is (provided the default naming scheme is left intact).
Yes! That's what I was looking for, and I vaguely remember seeing the same post on seq answers too! This post maybe?
I realize that this is a year and a half late, but here is some code from 10X that incorporates both the flow cell ID and the machine ID to figure out the run type.
Here are the flowcell barcodes for NovaSeq 6000. For every type: [A-Za-z0-9]{5}D[SMR]{1}X[XY2357]
Illumina provided the following information regarding FCID mapping to instruments. The indicated characters are the last four of the nine-character FCID (excepting MiSeq). Please note that this is not comprehensive and there may be new additions in the future.
AAXX = Genome Analyzer
BCXX = HiSeq v1.5
ACXX = HiSeq High-Output v3
ANXX = HiSeq High-Output v4
ADXX = HiSeq RR v1
AMXX, BCXX =HiSeq RR v2
ALXX = HiSeqX
BGXX, AGXX = High-Output NextSeq
AFXX = Mid-Output NextSeq
5 letter/number = MiSeq
Thank you for getting this from Illumina.
While it does not conclusively associate a run with the kind of sequencer used (I think some of the HiSeq flowcells can be used on multiple models) it does provide useful information. Also the chances of flowcell ID being altered (except with SRA :-)) are small so this association should be more reliable.
I believe you can extract the instrument type from the FCID in the read identifier (e.g. our HiSeq FCIDs all end in 'ACXX', MiSeqs all start with 'MS'), but you may need to contact Illumina to determine the complete code.
This page from Illumina also has some helpful information regarding flowcell serial numbers. The following R snippet may also be of use (it's just concatenated from the information mentioned here and above):
mutate(
sn_infer = case_when(
str_detect(sn, "^HWI-M|^M") ~ "MiSeq",
str_detect(sn, "^HWUSI") ~ "GAIIx",
str_detect(sn, "^HWI-D") ~ "HiSeq 2x00",
str_detect(sn, "^K") ~ "HiSeq 3/4000",
str_detect(sn, "^N") ~ "NextSeq 5x0",
str_detect(sn, "^A|^H") ~ "NovaSeq",
str_detect(sn, "^V|^AA") ~ "NextSeq 2000",
),
fc_infer = case_when(
str_detect(fc, "^(BRB|BP[ACGL]|BNT)") ~ "iSeq_100",
str_detect(fc, "000H") ~ "MiniSeq",
str_detect(fc, "^[BCJKDG]") ~ "MiSeq",
str_detect(fc, "(A[FG]|BG)..$") ~ "NextSeq_500/550 ",
str_detect(fc, "M5$") ~ "NextSeq_1000/2000",
str_detect(fc, "HV$") ~ "NextSeq_2000",
str_detect(fc, "([AB]C|AN)..$") ~ "HiSeq_2500",
str_detect(fc, "BB..$") ~ "HiSeq_3000/4000",
str_detect(fc, "(AL|CC)..$") ~ "HiSeq_X",
str_detect(fc, "D[RS]..$|DM.$") ~ "NovaSeq_6000"
)
I have run into this question several times over the years, and wrote this package to provide a command line tool for this problem. PRs and other feedback welcome! https://github.com/nickp60/fcid
$ fcid 22C37GLT3
NovaSeq X,NovaSeq X Plus
$ fcid MN10009 --by-machine
MiniSeq
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Biostars.org is my one stop solution for bioinformatics info and analysis software Keep it up.
Hi!
Anybody knows what the instrument is for this header?
@NB552493
That is either a NextSeq 500 or 550.
Awesome thanks