I'm trying to use data from the following series: GSE204761. I'm a bit confused at the formatting for scATAC-Seq data. This my first time working with scATAC-Seq data.
I realize that after sequencing, BCL files are produced. Then these files must be demultiplexed. When I download the fastq files from SRA, specifically: GSM6190720 - GSM6190725, they download as paired fastq files. However to demultiplex the data, don't they need to be in BCL format to use cellranger? I know I'm doing something wrong because I can't imagine why they would upload unusable data.
I'm also confused at the difference between P1.10.18.rep1.mm10
and Pool.P1.rep1.mm10
. I don't know if this is unique to the paper, but I thought I'd ask because they don't clarify what the difference is anywhere that I could find.
The processed data is provided in the series, however, I'm having trouble with the use of genome region instead of gene name, so I thought I'd learn how to process the data from the beginning. Thanks!
Edit: I should add that the reason I'm asking is because I tried using cellranger-atac count
with the downloaded fastq files, but I got the following error:
Log message:
Unable to read barcode sequence for read ID SRR19391837.1 1 length=43: there was no I2 read FASTQ and we were unable to read a 16-base barcode from the FASTQ header. Make sure that the flow cell was demultiplexed correctly.
Perhaps, this error and my question is related, but I don't know.