Parsing flowcell from hiseq 2000 read name
1
0
Entering edit mode
7.7 years ago
steven.davis ▴ 10

Is there a way to extract instrument and flowcell from a read name like this:

>gnl|SRA|SRR1840614.1.1 FCC1KPRACXX:1:1101:1291:2172

FCC1KPRACXX : what is this?
1 = lane?
1101 = ?
1291 = x
2172 = y

I need to extract the flowcell if possible, so it can be assigned to reads downstream in a sam file read-group tag.

illumina fastq hiseq • 2.4k views
ADD COMMENT
0
Entering edit mode

Pierre's description of the numbers is correct. Additionally, I suspect it is theoretically possible to convert "FCC1KPRACXX" into an instrument, but I'm not aware of a tool that does that. That is a string that Illumina sticks on the beginning of sequence identifiers and certainly has meaning to them; so, I suggest you contact Illumina and ask them how to translate it into a specific instrument. If you get a useful response, I'd encourage you to post it here.

ADD REPLY
1
Entering edit mode
7.7 years ago

https://en.wikipedia.org/wiki/FASTQ_format

FCC1KPRACXX : the flowcell id

1 = flowcell lane

1101 = tile number within the flowcell lane

1291 = 'x'-coordinate of the cluster within the tile

2172 = 'y'-coordinate of the cluster within the tile

ADD COMMENT
0
Entering edit mode

Thank you Pierre. I'm writing a Python module to parse the sequence ids and extract various data elements. I will put it on GitHub sometime soon.

ADD REPLY

Login before adding your answer.

Traffic: 2519 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6