If you follow this link to the Human Cell Atlas, you'll be taken to their metadata page where there are two files. The pertinent one is the tsv file that you can download. Looking at sample Hu5_ATAC, there are 16 fastq files. They belong in sets of four read values (I1, R1, R2, R3), 2 lanes (L001 & L002) and 2 other designations, 'a' & 'b'. The 'a' & 'b' samples have their own bundle_uuid.
All 16 fastq files have the same SRX value, and looking at its SRX page on GEO, it doesn't look like 'a' and 'b' are specified as anything different than each lane is to each other within this SRX. Looking at the SRR themselves, it looks like 'a' & 'b' were sequenced in different xy positions in the flow cell.
Two questions:
What is a bundle_uuid for this project? What does it mean?
And when analyzing this, should I just concatenate all the files into 4 merged files on read values and then proceed?
Also, possibly helpful snippets from the methods:
Samples were sequenced on either a NovaSeq SP or NovaSeq S1 100-cycle flow cell, depending on how many samples were sequenced at a given time (SP for 4 or under, S1 for 5-9)
Illumina NovaSeq 6000 - scATAC-seq 10x
sequencing_protocol_method_ontology: EFO:0010891
Thank you!
SRA Run Selector shows a better view of the metadata: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA761679&o=acc_s%3Aa
Thank you, but I've already checked out that site to see if I was missing anything and I don't see any answers to my questions there.
Looking at the code here
bundle_uuid
seems to be related to the submissions to HCA. Samples that ran on multiple lanes seem to have the same bundle_uuid. If the "a" and "b" samples have different bundle uuid then you will likely want to keep them separate. There is no X/Y position in a single flowcell for samples.Emailing the HCA may be the best option to get confirmation.