repeated read ids from Illumina MiniSeq fastq.gz files?
1
3
Entering edit mode
8.2 years ago

Has anyone experienced repeated read ids from Illumina MiniSeq fastq.gz files?

I have seen a few cases where the fastq.gz files produced from the bcls of a MiniSeq run contain a few spurious reads printed twice consecutively.

Is this something people have also experienced in here?

$ zgrep -A3 -n '@MN00325:3:000H223KC:1:11106:5362:8398\ 1:N:0:GTCCGC' SAMPLE01_S17_L001_R1_001.fastq.gz
SAMPLE01_S17_L001_R1_001.fastq.gz:1162953:@MN00325:3:000H223KC:1:11106:5362:8398 1:N:0:GTCCGC
SAMPLE01_S17_L001_R1_001.fastq.gz:1162954-AAAAAAATAAATAATTTTATTAAAAAGTGGGTAAAGCATATGAATGGATATTTTTTAAAAGAAGATATTTATGTAC
SAMPLE01_S17_L001_R1_001.fastq.gz:1162955-+
SAMPLE01_S17_L001_R1_001.fastq.gz:1162956-AFFFFFFFFFFFFFF//=F//FFFFFF66AFAFFFFFFFF//FF6//F/F=FA//AFFFF/FF/FFF///FAA/F/
SAMPLE01_S17_L001_R1_001.fastq.gz:1162957:@MN00325:3:000H223KC:1:11106:5362:8398 1:N:0:GTCCGC
SAMPLE01_S17_L001_R1_001.fastq.gz:1162958-AAAAAAATAAATAATTTTATTAAAAAGTGGGTAAAGCATATGAATGGATATTTTTTAAAAGAAGATATTTATGTAC
SAMPLE01_S17_L001_R1_001.fastq.gz:1162959-+
SAMPLE01_S17_L001_R1_001.fastq.gz:1162960-AFFFFFFFFFFFFFF//=F//FFFFFF66AF/FFFFFFFF//FF66/F6F=FF//AFFFF/FF/FFF///F/A/F/
fastq illumina • 2.5k views
ADD COMMENT
1
Entering edit mode

If real and reproducible then it sounds like a bug of some sort in the on-board data processing software on the MiniSeq. This would have been caught a long time ago unless the MiniSeq you are using has not been updated.

May also be worth emailing Illumina tech support in case whoever produced the data has no satisfactory answer.

ADD REPLY
0
Entering edit mode

can you post an example ?

ADD REPLY
0
Entering edit mode

I added an example in the question now.

ADD REPLY
0
Entering edit mode

The quality scores are slightly different, although they come from the same position on the flow cell. This is.... odd. Was BQSR done and then the files were merged or something?

ADD REPLY
0
Entering edit mode

Its strange Contact people who provided the data and ask if they have done any processing.

ADD REPLY
0
Entering edit mode

Never see anything like that using bcl2fastq2.

ADD REPLY
0
Entering edit mode
8.2 years ago

I contacted Illumina tech support and they said they will check the software on the machine.

Meanwhile, I implemented a -p flag in seqtk fqchk that will print out the consecutive same read_ids from an Illumina fastq.gz file:

git clone ssh://git@github.com/avilella/seqtk
cd seqtk
make
seqtk fqchk
Usage: seqtk fqchk [-q 20] <in.fq>
Options:
         -q INT      quality score
            Note: use -q0 to get the distribution of all quality values
         -p          check if previous read_id is same (Illumina fastq QC)

seqtk fqchk -q0 -p test.fq.gz 1>out.txt 2>err.txt

The err.txt file should be empty unless there are duplicates, with lines like this:

cat err.txt
[stk_fqchk] Same sequence names in consecutive reads: MN00325:3:000H223KC:1:11106:5362:8398 == MN00325:3:000H223KC:1:11106:5362:8398
ADD COMMENT

Login before adding your answer.

Traffic: 1969 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6