Question

What does 'bundle_uuid' refer to in this metadata sheet?

0

Entering edit mode

15 months ago

jeffrey.maurer.informatics • 0

If you follow this link to the Human Cell Atlas, you'll be taken to their metadata page where there are two files. The pertinent one is the tsv file that you can download. Looking at sample Hu5_ATAC, there are 16 fastq files. They belong in sets of four read values (I1, R1, R2, R3), 2 lanes (L001 & L002) and 2 other designations, 'a' & 'b'. The 'a' & 'b' samples have their own bundle_uuid.

All 16 fastq files have the same SRX value, and looking at its SRX page on GEO, it doesn't look like 'a' and 'b' are specified as anything different than each lane is to each other within this SRX. Looking at the SRR themselves, it looks like 'a' & 'b' were sequenced in different xy positions in the flow cell.

Two questions:

What is a bundle_uuid for this project? What does it mean?

And when analyzing this, should I just concatenate all the files into 4 merged files on read values and then proceed?

Also, possibly helpful snippets from the methods:

Samples were sequenced on either a NovaSeq SP or NovaSeq S1 100-cycle flow cell, depending on how many samples were sequenced at a given time (SP for 4 or under, S1 for 5-9)
- Illumina NovaSeq 6000 - scATAC-seq 10x
- sequencing_protocol_method_ontology: EFO:0010891

Thank you!

SRA illumina GEO metadata • 799 views

ADD COMMENT • link updated 15 months ago by GenoMax 153k • written 15 months ago by jeffrey.maurer.informatics • 0

0

Entering edit mode

SRA Run Selector shows a better view of the metadata: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA761679&o=acc_s%3Aa

ADD REPLY • link 15 months ago by GenoMax 153k

0

Entering edit mode

Thank you, but I've already checked out that site to see if I was missing anything and I don't see any answers to my questions there.

ADD REPLY • link 15 months ago by jeffrey.maurer.informatics • 0

1

Entering edit mode

Looking at the code here bundle_uuid seems to be related to the submissions to HCA. Samples that ran on multiple lanes seem to have the same bundle_uuid. If the "a" and "b" samples have different bundle uuid then you will likely want to keep them separate. There is no X/Y position in a single flowcell for samples.

Emailing the HCA may be the best option to get confirmation.

0f766318-c7fc-48f3-8b5d-9028808da7e9    2024-03-06T14:04:05.954000Z 7041431b-9398-4773-a8e7-5de142174d79    sequence_file   snRNA_wt12-2_WT2_S3_L001_I1_001.fastq.gz
0f766318-c7fc-48f3-8b5d-9028808da7e9    2024-03-06T14:04:05.954000Z 7311416c-9278-491e-90ea-a081a0f29d2a    sequence_file   snRNA_wt12-2_WT2_S3_L002_R2_001.fastq.gz
0f766318-c7fc-48f3-8b5d-9028808da7e9    2024-03-06T14:04:05.954000Z 74c89f0b-851b-4feb-ba8d-bd4264c2bd28    sequence_file   snRNA_wt12-2_WT2_S3_L001_R1_001.fastq.gz
0f766318-c7fc-48f3-8b5d-9028808da7e9    2024-03-06T14:04:05.954000Z 9723c9ae-49d0-4f5d-87d7-255a0f5b2399    sequence_file   snRNA_wt12-2_WT2_S3_L002_R1_001.fastq.gz
0f766318-c7fc-48f3-8b5d-9028808da7e9    2024-03-06T14:04:05.954000Z a976ac39-36ee-49c3-9a2a-22ede92c80b7    sequence_file   snRNA_wt12-2_WT2_S3_L002_I1_001.fastq.gz
0f766318-c7fc-48f3-8b5d-9028808da7e9    2024-03-06T14:04:05.954000Z ae48884a-054e-4ea2-8b0c-98b1f27181fe    sequence_file   snRNA_wt12-2_WT2_S3_L001_R2_001.fastq.gz

ADD REPLY • link 15 months ago by GenoMax 153k