Illumina paired end fastq sequence identifiers and index primers
1
0
Entering edit mode
3.4 years ago
wormball ▴ 10

Hello!

I have some paired end illumina fastq files. In most of these the sequence identifiers are like this:

....
@GENOTEK:000:311CE525F:3:1101:17996:1000 1:N:0:TCTTCACA+ATTACTCG
@GENOTEK:000:311CE525F:3:1101:21938:1000 1:N:0:TCTTCACA+ATTACTCG
@GENOTEK:000:311CE525F:3:1101:1208:1016 1:N:0:TCTTCACA+ATTACTCG
@GENOTEK:000:311CE525F:3:1101:3558:1016 1:N:0:TCTTCACA+ATTACTCG
....

So as i can understand TCTTCACA+ATTACTCG constitutes first and second index primers which are attached to the fragment to differentiate one end from another.

But at least one pair of files has identifiers like this:

....
@GENOTEK:000:9589D2457:7:1101:12895:1362 1:N:0:NTTACTCG
@GENOTEK:000:9589D2457:7:1101:16011:1379 1:N:0:NTTACTCG
@GENOTEK:000:9589D2457:7:1101:17381:1432 1:N:0:NTTACTCG
....

....
@GENOTEK:000:9589D2457:7:1101:12895:1362 2:N:0:NTTACTCG
@GENOTEK:000:9589D2457:7:1101:16011:1379 2:N:0:NTTACTCG
@GENOTEK:000:9589D2457:7:1101:17381:1432 2:N:0:NTTACTCG
....

So it contains only one index primer, and moreover, it is equal at both ends. Does it mean it is impossible to distinguish one end of the fragment from another, so these are effectively single end reads?

And also all the files have run number 000. Is it the thing to worry about?

Thanks in advance.

fastq primers identifiers Illumina • 2.1k views
ADD COMMENT
3
Entering edit mode
3.4 years ago
GenoMax 147k

@GENOTEK:000:9589D2457:7:1101:12895:1362 1:N:0:NTTACTCG <--- This set of data is using a single index.

@GENOTEK:000:311CE525F:3:1101:3558:1016 1:N:0:TCTTCACA+ATTACTCG <-- This dataset is using two indexes

In Illumina sequencing index reads are never part of actual sequence and are read independently. This has nothing to do with distinguishing one end of fragment from another. If you have paired-end sequencing data then you are sampling each fragment from both ends. If you have single end sequencing data then the fragment is sampled from only one end. In both cases you can have a single index or two indexes. Indexes are simply being used to label samples to allow bioinformatic read separation after the run.

And also all the files have run number 000. Is it the thing to worry about?

That should not be a cause of worry. My assumption is that the name may have been changed afterwards.

ADD COMMENT

Login before adding your answer.

Traffic: 2708 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6