There are lot of publicly available sample fastq files.
How to identify if they are generated from Illumina machines or ONT(Oxford Nanopore Technology) machines or any other machines?
There are lot of publicly available sample fastq files.
How to identify if they are generated from Illumina machines or ONT(Oxford Nanopore Technology) machines or any other machines?
Assuming the data has not been modified in any way, you could look for a couple of things
Length of read (majority of reads)
Illumina : 300 bp or less (while 600 bp reads are possible no one generally does that)
Nanopore : Reads that are generally longer than 300 bp
Fastq headers
Illumina header format is well defined - https://en.wikipedia.org/wiki/FASTQ_format#Illumina_sequence_identifiers
ONT headers look like
@3cfgt6cd-3671-4tgd-c61g-0c759bt068d0 runid=eb6214851489c8e00eb0dcbd00d737f7cccxxxx read=2178 ch=232 start_time=2021-11-21T10:35:44Z
Quality scores
Illumina reads - scores are going to be closer to Q35+
Nanopore reads - scores are going to be generally lower
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.