Question

Is it possible to Demultiplex FASTQ files?

0

Entering edit mode

23 months ago

James • 0

I'm trying to use data from the following series: GSE204761. I'm a bit confused at the formatting for scATAC-Seq data. This my first time working with scATAC-Seq data.

I realize that after sequencing, BCL files are produced. Then these files must be demultiplexed. When I download the fastq files from SRA, specifically: GSM6190720 - GSM6190725, they download as paired fastq files. However to demultiplex the data, don't they need to be in BCL format to use cellranger? I know I'm doing something wrong because I can't imagine why they would upload unusable data.

I'm also confused at the difference between P1.10.18.rep1.mm10 and Pool.P1.rep1.mm10. I don't know if this is unique to the paper, but I thought I'd ask because they don't clarify what the difference is anywhere that I could find.

The processed data is provided in the series, however, I'm having trouble with the use of genome region instead of gene name, so I thought I'd learn how to process the data from the beginning. Thanks!

Edit: I should add that the reason I'm asking is because I tried using cellranger-atac count with the downloaded fastq files, but I got the following error:

Log message: Unable to read barcode sequence for read ID SRR19391837.1 1 length=43: there was no I2 read FASTQ and we were unable to read a 16-base barcode from the FASTQ header. Make sure that the flow cell was demultiplexed correctly.

Perhaps, this error and my question is related, but I don't know.

scATAC-Seq • 2.2k views

ADD COMMENT • link updated 23 months ago by GenoMax 147k • written 23 months ago by James • 0

score 0 · Answer 1 · 2022-12-26

Reads uploaded to SRA are almost always already demultiplexed. You don't need to demultiplex them.

If I understand correctly, Cell Ranger doesn't work because it doesn't like how the headers of reads look. Try downloading reads with "fastq-dump --origfmt". This will keep the original headers of reads. If this doesn't solve the problem, it's probably worth writing to the authors of the original paper.

score 0 · Answer 2 · 2022-12-26

Looks like the submitter's did not provide original Illumina fastq formatted data for this set.

Your best bet is to download the original BAM format files they provided with the data from: https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR19391852&display=data-access (look in data access tab). Then use bamtofastq (LINK) tool provided by 10x to reconstruct the original fastq files in correct format.