UMI in seqc2 dataset Burning Rock data

0

Entering edit mode

19 months ago

PeterWu ▴ 20

I'm working on the seqc2 liquid biopsy dataset (https://www.nature.com/articles/s41587-021-00857-z, a detailed description of data can be found in https://www.nature.com/articles/s41597-022-01276-8). And when looking at a Burning Rock sequencing data (SRR13200965, https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&page_size=10&acc=SRR13200965&display=reads). The read identifier looks like:

gnl|SRA|SRR13200965.1.1A00463:54:H7JGKDMXX:1:1101:1344:1000:JEYSRQSD+VSTREIPV

And they describe their read processing like:

... After demultiplex and moving 6-bp UMI to the sequence header using bcl2fastq14 v2.20 (Illumina) ...

So my question is: how to get the UMI? I guess the 'JEYSRQSD+VSTREIPV' part in read identifier should be relevant. But they are not 'bases' neither has length 6. How to inteprete those data? Thanks

sequencing UMI • 881 views

ADD COMMENT • link updated 19 months ago by LChart 4.6k • written 19 months ago by PeterWu ▴ 20

0

Entering edit mode

That's likely the raw read; and the UMI is likely contained in the sequence itself. So to get the UMI, repeat their approach and use bcl2fastq14

ADD REPLY • link 19 months ago by LChart 4.6k

0

Entering edit mode

Thank for reply. I'm a bit confused, did bcl2fastq a tool converting bcl to fastq? I guess it cannot do things like 'moving first 6 bases to read headers' from fastq.