Question

Illumina Instrument Type from fastq?

11

Entering edit mode

8.4 years ago

andrew.j.skelton73 6.6k

Hi,

Any idea if there's a way to get the Instrument type (i.e. NextSeq, HiSeq, etc), from the instrument name field in the Fastq read header? - I've tried looking for a map of sorts in Illumina's documentation, but to no avail.

Thanks,

fastq • 37k views

ADD COMMENT • link updated 8 months ago by nickp60 ▴ 60 • written 8.4 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

Biostars.org is my one stop solution for bioinformatics info and analysis software Keep it up.

ADD REPLY • link 8.4 years ago by mmulinge • 0

0

Entering edit mode

Hi!

Anybody knows what the instrument is for this header?

@NB552493

ADD REPLY • link 19 months ago by Olov • 0

0

Entering edit mode

That is either a NextSeq 500 or 550.

ADD REPLY • link 19 months ago by GenoMax 147k

0

Entering edit mode

Awesome thanks

ADD REPLY • link 19 months ago by Olov • 0

2

Entering edit mode

8.4 years ago

harold.smith.tarheel ★ 5.0k

I believe you can extract the instrument type from the FCID in the read identifier (e.g. our HiSeq FCIDs all end in 'ACXX', MiSeqs all start with 'MS'), but you may need to contact Illumina to determine the complete code.

ADD COMMENT • link 8.4 years ago by harold.smith.tarheel ★ 5.0k

1

Entering edit mode

This was along the lines of what I was thinking, as there are similar tags in all the nextSeq data I see coming through, however I was hoping somebody had a handy map of these in some dark corner of the internet I've not come across yet!

ADD REPLY • link 8.4 years ago by andrew.j.skelton73 6.6k

2

Entering edit mode

19 months ago

Nathan ▴ 20

This page from Illumina also has some helpful information regarding flowcell serial numbers. The following R snippet may also be of use (it's just concatenated from the information mentioned here and above):

  mutate(
    sn_infer = case_when(
      str_detect(sn, "^HWI-M|^M") ~ "MiSeq",
      str_detect(sn, "^HWUSI") ~ "GAIIx",
      str_detect(sn, "^HWI-D") ~ "HiSeq 2x00",
      str_detect(sn, "^K") ~ "HiSeq 3/4000",
      str_detect(sn, "^N") ~ "NextSeq 5x0",
      str_detect(sn, "^A|^H") ~ "NovaSeq",
      str_detect(sn, "^V|^AA") ~ "NextSeq 2000",
    ),
    fc_infer = case_when(
      str_detect(fc, "^(BRB|BP[ACGL]|BNT)") ~ "iSeq_100",
      str_detect(fc, "000H") ~ "MiniSeq",
      str_detect(fc, "^[BCJKDG]") ~ "MiSeq",
      str_detect(fc, "(A[FG]|BG)..$") ~ "NextSeq_500/550 ",
      str_detect(fc, "M5$") ~ "NextSeq_1000/2000",
      str_detect(fc, "HV$") ~ "NextSeq_2000",
      str_detect(fc, "([AB]C|AN)..$") ~ "HiSeq_2500",
      str_detect(fc, "BB..$") ~ "HiSeq_3000/4000",
      str_detect(fc, "(AL|CC)..$") ~ "HiSeq_X",
      str_detect(fc, "D[RS]..$|DM.$") ~ "NovaSeq_6000"
    )

ADD COMMENT • link 19 months ago by Nathan ▴ 20

0

Entering edit mode

The link included in this answer no longer appears to be working.

ADD REPLY • link 9 months ago by GenoMax 147k

2

Entering edit mode

8 months ago

nickp60 ▴ 60

I have run into this question several times over the years, and wrote this package to provide a command line tool for this problem. PRs and other feedback welcome! https://github.com/nickp60/fcid

$ fcid 22C37GLT3
NovaSeq X,NovaSeq X Plus
$ fcid MN10009 --by-machine
MiniSeq

ADD COMMENT • link 8 months ago by nickp60 ▴ 60

score 29 · Accepted Answer · 2016-06-22

29

Entering edit mode

8.4 years ago

GenoMax 147k

There was a post @SeqAnswers where we had a recap of the types of sequencer specific fastq headers (my google-fu has not turned up that post yet). Following is a very rough approximation of start of fastq headers that are sequencer specific.

Edit: Just want to add that it is possible to change the default read header in one of the set up files (so the following may not always be true).

@HWI-Mxxxx or @Mxxxx - MiSeq
@HWUSI - GAIIx
@HWI-Dxxxx - HiSeq 2000/2500
@Kxxxx - HiSeq 3000(?)/4000
@Nxxxx - NextSeq 500/550
@Axxxxx - NovaSeq
@Vxxxxx = NextSeq 2000
Edit (08/2022)
@AAxxxxx - NextSeq 2000 P1/P2/P3
@Hxxxxxx - NovaSeq S1/S2/S4

ADD COMMENT • link 2.3 years ago by GenoMax 147k

2

Entering edit mode

Actually, the MiSeq read headers should be @Mxxxx

ADD REPLY • link 8.4 years ago by igor 13k

1

Entering edit mode

I don't know if those are still the default headers. Our new MiSeq outputs '@Mxxx' headers so, unless the FAS reconfigured it upon installation, it's the current default. Similarly, our HiSeq headers are not prefaced with '@HISEQ'.

The FCIDs are generated automatically during the flow cell scan, which is why I recommended using those to unambiguously distinguish the instruments. But I have not yet been able to find the complete code for conversion. I'll continue digging.

ADD REPLY • link 8.4 years ago by harold.smith.tarheel ★ 5.0k

1

Entering edit mode

That is why I added a disclaimer.

It is possible to reconfigure the default headers (we have several HiSeq/MiSeq and the headers are slightly different). I have not looked through the config files to see where exactly this is set.

I remember that the FAS's have access to a special illumina tool/database where they can lookup kit/flowcell ID's but I don't think that tool is available to us.

ADD REPLY • link 8.4 years ago by GenoMax 147k

1

Entering edit mode

Do you mean FCID = Name of the data folder?

That generally has the format

For HiSeq

YYMMDD_InstrumentSerialNumber_RunNumber*_[A/B]FlowCellBarcode

For MiSeq

YYMMDD_InstrumentSerialNumber_RunNumber*_000000000-FlowCellBarcode

YYMMDD - Date the run started
*RunNumber - Cumulative run number (incremented by 1 for each run) on that sequencer.
A/B - signifies the flowcell position (in case of instruments that can run 2 FC at a time).

These names are configurable in a .cfg settings file and will give one an idea of the kind of sequencer it is (provided the default naming scheme is left intact).

ADD REPLY • link 8.4 years ago by GenoMax 147k

1

Entering edit mode

FCID = flow cell identifier (same as your FlowCellBarcode). In the read identifier, it's the third field.

ADD REPLY • link 8.4 years ago by harold.smith.tarheel ★ 5.0k

0

Entering edit mode

Yes! That's what I was looking for, and I vaguely remember seeing the same post on seq answers too! This post maybe?

ADD REPLY • link 8.4 years ago by andrew.j.skelton73 6.6k

1

Entering edit mode

That is one of them. I seem to recollect that there was another one specifically like the question you asked.

ADD REPLY • link 8.4 years ago by GenoMax 147k

0

Entering edit mode

HiSeq 3000 starts with a J, not a K. Also, our HiSeq 2500 defaults to SNXXXXX.

Edit: Apparently HiSeq 3000s can start with either (consistency!). I bet our HiSeq 2500 starts with an S because it started life as a 2000 and got an upgrade.

ADD REPLY • link 7.0 years ago by Devon Ryan 104k

score 13 · Accepted Answer · 2017-12-07

13

Entering edit mode

7.0 years ago

Devon Ryan 104k

I realize that this is a year and a half late, but here is some code from 10X that incorporates both the flow cell ID and the machine ID to figure out the run type.

ADD COMMENT • link 7.0 years ago by Devon Ryan 104k

0

Entering edit mode

Fantastic find - really useful.

ADD REPLY • link 7.0 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

I wonder if anyone has NovaSeq data so we could add that info here...

ADD REPLY • link 7.0 years ago by Devon Ryan 104k

2

Entering edit mode

Not 100% sure: S2 cells "H[A-Z,0-9]{4}DMXX$"
Sequencer serial may be "A[0-9]{6}$" for NovaSeq 6000.

ADD REPLY • link 7.0 years ago by GenoMax 147k

2

Entering edit mode

NovaSeq S4 flowcell barcodes are ^H[A-Z,0-9]{4}DSXX$

ADD REPLY • link 6.8 years ago by Dan D 7.4k

0

Entering edit mode

Well that was quick (I notice now that 10X has a line for the NovaSeq in the source code I linked to).

ADD REPLY • link 7.0 years ago by Devon Ryan 104k

0

Entering edit mode

Guess that number is for S2 FC. So someone will have to give us data for S1/S4.

ADD REPLY • link 7.0 years ago by GenoMax 147k

0

Entering edit mode

Every NovaSeq serial I've seen has been of the form ^A\d{5}$ (five, not six digits). That's just at one site though.

ADD REPLY • link 6.8 years ago by Dan D 7.4k

1

Entering edit mode

Still doing calibration runs on the nova at my institute, but I'll update when I get some data through

ADD REPLY • link 7.0 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

Here are the flowcell barcodes for NovaSeq 6000. For every type: [A-Za-z0-9]{5}D[SMR]{1}X[XY2357]

ADD REPLY • link 9 months ago by ajay nair ▴ 50

0

Entering edit mode

Anyone know what a Instrument ID starting "@ST-E" means?

ADD REPLY • link 6.6 years ago by i.sudbery 20k

0

Entering edit mode

Is this recent data? Could it be from a iSeq 100?

ADD REPLY • link 6.6 years ago by GenoMax 147k

0

Entering edit mode

I doubt it, there is 30M reads in each sample.

ADD REPLY • link 6.6 years ago by i.sudbery 20k

0

Entering edit mode

True. Do you have the FC barcode?

ADD REPLY • link 6.6 years ago by GenoMax 147k

0

Entering edit mode

It's a HiSeqX, If you check some of Illumina's public data from one of those instruments you will find that the instrument ID will either start with a E or an ST-E.

ADD REPLY • link 5.8 years ago by mvizue01 • 0

score 7 · Accepted Answer · 2016-06-23

7

Entering edit mode

8.4 years ago

harold.smith.tarheel ★ 5.0k

Illumina provided the following information regarding FCID mapping to instruments. The indicated characters are the last four of the nine-character FCID (excepting MiSeq). Please note that this is not comprehensive and there may be new additions in the future.

AAXX = Genome Analyzer
BCXX = HiSeq v1.5
ACXX = HiSeq High-Output v3
ANXX = HiSeq High-Output v4
ADXX = HiSeq RR v1
AMXX, BCXX =HiSeq RR v2
ALXX = HiSeqX
BGXX, AGXX = High-Output NextSeq
AFXX = Mid-Output NextSeq
5 letter/number = MiSeq

ADD COMMENT • link 8.4 years ago by harold.smith.tarheel ★ 5.0k

0

Entering edit mode

Thank you for getting this from Illumina.

While it does not conclusively associate a run with the kind of sequencer used (I think some of the HiSeq flowcells can be used on multiple models) it does provide useful information. Also the chances of flowcell ID being altered (except with SRA :-)) are small so this association should be more reliable.

ADD REPLY • link 8.4 years ago by GenoMax 147k

0

Entering edit mode

What kind of instrument uses flowcells ending with BBXX? Here is the read sequence id line:

@ERR1417747.1 K00150:65:H77FVBBXX:7:1101:1570:1297 length=150

ADD REPLY • link 7.7 years ago by steven.davis ▴ 10

0

Entering edit mode

The first part @ERR1417747 is SRA accession ID that get's tacked on (unless you use -F to regenerate sequence headers in the original Illumina format). Considering the sequencer ID K00150 that follows this should be a HiSeq 3000/4000 run.

ADD REPLY • link 7.7 years ago by GenoMax 147k

0

Entering edit mode

I got some fastq files, the id is like:

@FCH7HCYADXY:1:1101:10595:1825

what kind of instrument can it be? How can I get the latest RCID description?

ADD REPLY • link 7.3 years ago by liangzebin5566 ▴ 30