UMI in seqc2 dataset Burning Rock data
0
0
Entering edit mode
19 months ago
PeterWu ▴ 20

I'm working on the seqc2 liquid biopsy dataset (https://www.nature.com/articles/s41587-021-00857-z, a detailed description of data can be found in https://www.nature.com/articles/s41597-022-01276-8). And when looking at a Burning Rock sequencing data (SRR13200965, https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&page_size=10&acc=SRR13200965&display=reads). The read identifier looks like:

gnl|SRA|SRR13200965.1.1A00463:54:H7JGKDMXX:1:1101:1344:1000:JEYSRQSD+VSTREIPV

And they describe their read processing like:

... After demultiplex and moving 6-bp UMI to the sequence header using bcl2fastq14 v2.20 (Illumina) ...

So my question is: how to get the UMI? I guess the 'JEYSRQSD+VSTREIPV' part in read identifier should be relevant. But they are not 'bases' neither has length 6. How to inteprete those data? Thanks

sequencing UMI • 881 views
ADD COMMENT
0
Entering edit mode

That's likely the raw read; and the UMI is likely contained in the sequence itself. So to get the UMI, repeat their approach and use bcl2fastq14

ADD REPLY
0
Entering edit mode

Thank for reply. I'm a bit confused, did bcl2fastq a tool converting bcl to fastq? I guess it cannot do things like 'moving first 6 bases to read headers' from fastq.

ADD REPLY
0
Entering edit mode

True. You could potentially use UMItools --extract to do this

ADD REPLY

Login before adding your answer.

Traffic: 1387 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6