Hi,
I recently received Drop-seq data back. However, I'm having a little trouble interpreting the raw data format. I have two reads, one is 20 bp, which is what I'm assuming to be the cell barcode concatenated with the molecular barcode (UMI). The other is 62 bp, which is the one I'm having trouble understanding. Are the first 12 bp the cell barcode again? Also, what does it mean to de demultiplex? Isn't each cell supposed to have a unique barcode? If so, then why does demultiplexing rely on a given barcode for each condition?
Thanks
I am using the standard protocol. You're solution suggests a 70 bp read, but I have 62. I am more concerned with demultiplexing, however.
Do you have just two files? Can you post a few example fastq headers/reads from the files? It seems to me that your first file may have the barcode+UMI and the second file actual sequence data+UMI.
Yes, only two files per condition. Looking back at them, I think you're right. Does that mean these are already demultiplexed?
Possibly not. Puzzling why they are split like that. If you look at that image again, once you group the reads by the barcode then you need to count the unique UMI for each gene. May be best to confirm with whoever gave you the data.