Question

What does the name of a read mean? How to find its location in genome viewer?

0

Entering edit mode

5 months ago

sacryt • 0

I have a list of read names, and I want to find where in the genome these reads lie, but I don't know how to do this. I am using UK biobank sequencing data and looking on the integrated genome viewer.

The read names look like this:

A00715:256:HWYM5DSXY:4:2573:10131:31751
A00715:256:HWYM5DSXY:1:1427:23637:2879
A00715:256:HWYM5DSXY:3:1662:7030:36526

But I have no idea what this means, so I don't know where in the genome they are?

On the IGV I have tried searching for parts of it but I don't think it finds the right place? E.g. for the first read I searched for just "4:2573" but this gave me a location on chr4 with no base calls

sorry for probably a very naive question but no idea where to find out more about this, i don't know if this is a UK biobank specific naming convention, or a more universal standard etc.?

ukb reads • 387 views

ADD COMMENT • link updated 5 months ago by Pierre Lindenbaum 164k • written 5 months ago by sacryt • 0

score 2 · Answer 1 · 2024-06-09

see fastq format in wikipedia "Illumina sequence identifiers" : https://en.wikipedia.org/wiki/FASTQ_format

A00715  the unique instrument name
256     the run id
HWYM5DSXY   the flowcell id
4   flowcell lane
2573    tile number within the flowcell lane
10131   'x'-coordinate of the cluster within the tile
31751   'y'-coordinate of the cluster within the tile

score 0 · Answer 2 · 2024-06-09

Those are names produced by the sequencing machine that don't tell you anything meaningful.

The actual A's, T's, C's, G's in your sequencing files are all you need to look at. To figure out where a sequence exists along a chromosome, you have to map those nucleotides to a reference genome using a genome aligner tool such as bwa.

Basically, you have to run a tool on your sequencing files to produce an alignment file (such as a BAM file) -- that's what you load into IGV.